Open SeokjuLee opened 6 years ago
Hello. There always something wrong when i download "Lee_VPGNet_Vanishing_Point_ICCV_2017_supplemental.pdf" from the internet, could you send a new pdf to me ? Thank you very much!
@CSUMIT Oh, our supplemental file is not pdf format. It's just a video clip and here is the link :) https://www.youtube.com/watch?v=jnewRlt6UbI
Oh, it is a supplemental that i find in : http://openaccess.thecvf.com/ICCV2017.py. The supplemental of your paper.
@CSUMIT I uploaded a video file but the organizer seems to have changed the file to pdf format. You may ignore that link.
OK, thank you very much.
Should I make the dir ./snapshots myself? Or use sudo train.sh?
@daixiaogang Yes, please try it again after making ./snapshots. Updated the code. Thanks!
@SeokjuLee ,I have trained your network with caltech-lane dataset correctly.After about 14 hours ,it iterates 30000,but when I plot the test and train loss log,it looks strange. The training loos converge by iteration but the test loss first increase and then unchanged.I did not change any of your configuration,should I make some change? As the dataset is small. (For training, cordova1 ,cordova2,washington1 as the trainlist ,washington2 as the testlist)
@daixiaogang It is normal for the validation accuracy to rise sharply in the beginning and not change significantly afterwards. This is because the size of the object to be inferred is much smaller than the background area. If you expand the validation curve, you can see that it is increasing slightly.
Is there any trained model that can be used for test ?
@SeokjuLee ,I want to transfer my label which likes(x1,y1,x1',y1'),(x2,y2,x2',y2').My picture is 1280x1024 which different from yours 640x480,can you give me some advice on how transfer labels ? Or just change the width and height parameter in your vpg_annot_v1.m?
@SeokjuLee ,because my label only have two points(x1,y1,x1',y1'),can not call the function ccvEvalBezSpline() in vpg_annot_v1.m.,so it got wrong. Do we must make spline first to make bouding box? can you give me some advice?
@daixiaogang First, you should decide which input size to use, 640x480 or 1280x1024. If you want to use the latter one, please check the intermediate activation sizes after the branches. Basically the network is full convolutional so various sizes are applicable, but there might need some parameter tunings. Second, Is your label containing only two end points that represent one straight line for each lane? The label doesn't need to be always spline curve. First draw each straight line with two points on the image, then annotate grids through which the line passes.
@SeokjuLee ,Thanks for your explaination,I have make the label like yours. I want to know more about your parameters to make anonation,such as gridsize(8) and thickness(2),should these parameters make some change to fit my picture?
@SeokjuLee ,I have trained your net witch caltech-lane dataset.I want to konw how to test my results or output the cordinate(x1,y1,x2,y2...) of lanes, can you give me some advice?
@daixiaogang Well, you should better not to change grid size (8) because that parameter depends on the rescaling factor between input (640x480) and output (multi-label:80x60) size. The thickness depends on the lane width. If the grid annotation covers the lane markings enough, I don't think you need to change it. About the demos and tests, use deploy.protxt and load models you've trained. You can visualize it through the multi-label and binary mask outputs.
@SeokjuLee ,Thanks for your guide.But I still have some question for the demos and tests. I use the command "./build/tools/caffe test -model models/vpgnet-novp/deploy.prototxt -weights models/vpgnet-novp/snapshots/split_iter_82500.caffemodel -iterations 1 >>output.log 2>&1",I want to know how to input a picture and get the cordinate of the lane?Because deploy.prototxt did not indicate the input. Can you give me some advice?When I use the following python code to run deploy.prototxt,it got wrong:"[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 452:16: Message type "caffe.LayerParameter" has no field named "tiling_param"." ---------------------------------------------------code--------------------------------------------------------------------- import sys import numpy sys.path.append('/home/swjtu/daixiaogang/VPGNet/caffe/python') import caffe
WEIGHTS_FILE = './snapshots/split_iter_82500.caffemodel' DEPLOY_FILE = 'deploy.prototxt' IMAGE_SIZE = (480, 640) MEAN_VALUE = 128
caffe.set_mode_cpu() net = caffe.Net(DEPLOY_FILE, WEIGHTS_FILE, caffe.TEST) net.blobs['data'].reshape(1, 1, *IMAGE_SIZE)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) transformer.set_transpose('data', (2,0,1)) transformer.set_mean('data', numpy.array([MEAN_VALUE])) transformer.set_raw_scale('data', 255)
image_list = sys.argv[1]
with open(image_list, 'r') as f: for line in f.readlines(): filename = line[:-1] image = caffe.io.load_image(filename, False) transformed_image = transformer.preprocess('data', image) net.blobs['data'].data[...] = transformed_image output = net.forward()
print output
------------------------------------------end--------------------------------------------------------------------------
@daixiaogang Try net.forward_all(data=np.array(transformed_image)); binary_mask = net.blobs['binary-mask'].data[0];
@SeokjuLee ,it get wrong as the following message. ---------------------------------------------------------message----------------------------------------------------------- WARNING: Logging before InitGoogleLogging() is written to STDERR W1226 15:32:11.351035 8293 _caffe.cpp:122] DEPRECATION WARNING - deprecated use of Python interface W1226 15:32:11.351058 8293 _caffe.cpp:123] Use this instead (with the named "weights" parameter): W1226 15:32:11.351063 8293 _caffe.cpp:125] Net('deploy.prototxt', 1, weights='./snapshots/split_iter_82500.caffemodel') [libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 452:16: Message type "caffe.LayerParameter" has no field named "tiling_param". F1226 15:32:11.352567 8293 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: deploy.prototxt Check failure stack trace: ------------------------------------------------end--------------------------------------------------------------------- it turns out "caffe.LayerParameter" has no field named "tiling_param",but I have found this param in caffe.proto. It is very strange. Can you write a demo to use "deploy.prototxt"
@daixiaogang Our pycaffe loading lines are almost same as yours. Could you check the pycaffe path? "tiling_param" is a customized caffe parameter. Check this link. https://github.com/alexgkendall/SegNet-Tutorial/issues/4
Is this the binary_mask ? should I add the mean_value,what's the size of meanvalue?
@daixiaogang Please use updated model. The deploy model is updated. "binary-mask" and "multi-label" are softmax output.
@SeokjuLee,can you explain how to visualize the ouput by "binary-mask" and "multi-label" ? I am reading your code (drive_data_layer.cpp and convert_driving_data.cpp),I am wondering are you trying to regression the gridbox(x,y,w,h)like object detetcion?so your output is the bouding box?
@daixiaogang Could you please refer our paper? We elaborate the post-processing in Section 4.4 :)
@SeokjuLee Hi, i have a problem with 'run 'train.sh' '. After following the four steps on the home page, i used sh train.sh
to run 'train.sh'. But it had no response that could tell me how it is going. It was shown as this:
root@root:~/dl/VPGNet/caffe/models/vpgnet-novp$ sh train.sh
|
(Here is a stationary cursor)
I used an NVIDIA GeForce GTX1080 to train it. But i get stuck here. I don't know if it's normal training time or i had problem with the code. I have waited for nearly an hour. Could you share me your training time or give me some advice to fix it. Thank you very much!
BTW: I can't use train.sh
to open it because it display command not found
. So i use sh
and bash
. Does it matter?
@wsyzzz ,just try to use ./train.sh to run this code.
@daixiaogang , thanks for your advice. However, it doesn't seem to make a difference. It still gets stuck and has no response. I will wait an hour to see it.
Besides, the terminal isn't no response. I can input enter and string. And i use nvidia-smi
to view GPU processes. It shows:
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 1129 C ../../build/tools/caffe 7149MiB |
@wsyzzz Usually it works by just typing './train.sh' if the script file is set with executable mode. Have you tried to type command lines inside the script? If it doesn't respond, try to run with 'python debug_seokju.py' because we need to see at least one line of the errors.
@SeokjuLee Thanks for your responds. What do you mean by 'type command lines inside the script'? I add an output in the head of train.sh. And it shows that train.sh is executed because it displays my output but still got stuck in ../../build/tools/caffe train --solver=./solver.prototxt >> ./output/output.log 2>&1
. Besides, this process can be found in GPU processes by nvidia-smi
just like the previous comment.
I tried run 'python debug_seokju.py'. It showed like that(too long to show here):
…………………………
I0105 03:38:08.007841 29620 net.cpp:432] bb-num-pixel-normalization -> bb-masked-output-sn-nn
I0105 03:38:08.007849 29620 net.cpp:155] Setting up bb-num-pixel-normalization
I0105 03:38:08.007856 29620 net.cpp:163] Top shape: 10 4 120 160 (768000)
I0105 03:38:08.007860 29620 layer_factory.hpp:76] Creating layer bb-loss
I0105 03:38:08.007867 29620 net.cpp:110] Creating Layer bb-loss
I0105 03:38:08.007872 29620 net.cpp:476] bb-loss <- bb-masked-output-sn-nn
I0105 03:38:08.007879 29620 net.cpp:476] bb-loss <- bb-label-sn-nn
I0105 03:38:08.007884 29620 net.cpp:432] bb-loss -> bb-loss
I0105 03:38:08.007939 29620 net.cpp:155] Setting up bb-loss
I0105 03:38:08.007946 29620 net.cpp:163] Top shape: (1)
I0105 03:38:08.007951 29620 net.cpp:168] with loss weight 3
I0105 03:38:08.007958 29620 net.cpp:236] bb-loss needs backward computation.
I0105 03:38:08.007963 29620 net.cpp:236] bb-num-pixel-normalization needs backward computation.
I0105 03:38:08.007969 29620 net.cpp:236] bb-size-normalization needs backward computation.
I0105 03:38:08.007975 29620 net.cpp:236] bb-prob-mask needs backward computation.
I0105 03:38:08.007980 29620 net.cpp:240] type-acc does not need backward computation.
I0105 03:38:08.007985 29620 net.cpp:236] type-loss needs backward computation.
I0105 03:38:08.007992 29620 net.cpp:240] pixel-acc does not need backward computation.
I0105 03:38:08.007997 29620 net.cpp:236] pixel-loss needs backward computation.
I0105 03:38:08.008003 29620 net.cpp:236] type-conv-tiled_type-tile_0_split needs backward computation.
I0105 03:38:08.008008 29620 net.cpp:236] type-tile needs backward computation.
I0105 03:38:08.008013 29620 net.cpp:236] bb-tile needs backward computation.
I0105 03:38:08.008018 29620 net.cpp:236] pixel-conv-tiled_pixel-tile_0_split needs backward computation.
I0105 03:38:08.008023 29620 net.cpp:236] pixel-tile needs backward computation.
I0105 03:38:08.008028 29620 net.cpp:236] type-conv needs backward computation.
I0105 03:38:08.008031 29620 net.cpp:236] pixel-conv needs backward computation.
I0105 03:38:08.008036 29620 net.cpp:236] bb-output needs backward computation.
I0105 03:38:08.008041 29620 net.cpp:236] drop7c needs backward computation.
I0105 03:38:08.008046 29620 net.cpp:236] relu7c needs backward computation.
I0105 03:38:08.008050 29620 net.cpp:236] L6c needs backward computation.
I0105 03:38:08.008055 29620 net.cpp:236] drop7b needs backward computation.
I0105 03:38:08.008059 29620 net.cpp:236] relu7b needs backward computation.
I0105 03:38:08.008064 29620 net.cpp:236] L6b needs backward computation.
I0105 03:38:08.008067 29620 net.cpp:236] drop7a needs backward computation.
I0105 03:38:08.008071 29620 net.cpp:236] relu7a needs backward computation.
I0105 03:38:08.008075 29620 net.cpp:236] L6a needs backward computation.
I0105 03:38:08.008080 29620 net.cpp:236] L5_drop6_0_split needs backward computation.
I0105 03:38:08.008085 29620 net.cpp:236] drop6 needs backward computation.
I0105 03:38:08.008090 29620 net.cpp:236] relu6 needs backward computation.
I0105 03:38:08.008093 29620 net.cpp:236] L5 needs backward computation.
I0105 03:38:08.008098 29620 net.cpp:236] pool5 needs backward computation.
I0105 03:38:08.008102 29620 net.cpp:236] relu5 needs backward computation.
I0105 03:38:08.008106 29620 net.cpp:236] L4 needs backward computation.
I0105 03:38:08.008111 29620 net.cpp:236] relu4 needs backward computation.
I0105 03:38:08.008116 29620 net.cpp:236] L3 needs backward computation.
I0105 03:38:08.008121 29620 net.cpp:236] relu3 needs backward computation.
I0105 03:38:08.008124 29620 net.cpp:236] L2 needs backward computation.
I0105 03:38:08.008128 29620 net.cpp:236] pool2 needs backward computation.
I0105 03:38:08.008133 29620 net.cpp:236] norm2 needs backward computation.
I0105 03:38:08.008137 29620 net.cpp:236] relu2 needs backward computation.
I0105 03:38:08.008142 29620 net.cpp:236] L1 needs backward computation.
I0105 03:38:08.008147 29620 net.cpp:236] pool1 needs backward computation.
I0105 03:38:08.008152 29620 net.cpp:236] norm1 needs backward computation.
I0105 03:38:08.008155 29620 net.cpp:236] relu1 needs backward computation.
I0105 03:38:08.008159 29620 net.cpp:236] L0 needs backward computation.
I0105 03:38:08.008164 29620 net.cpp:240] bb-label-num-pixel-normalization does not need backward computation.
I0105 03:38:08.008170 29620 net.cpp:240] bb-label-size-normalization does not need backward computation.
I0105 03:38:08.008177 29620 net.cpp:240] norm-block_norm-block_0_split does not need backward computation.
I0105 03:38:08.008182 29620 net.cpp:240] norm-block does not need backward computation.
I0105 03:38:08.008189 29620 net.cpp:240] size-block_size-block_0_split does not need backward computation.
I0105 03:38:08.008194 29620 net.cpp:240] size-block does not need backward computation.
I0105 03:38:08.008200 29620 net.cpp:240] pixel-block does not need backward computation.
I0105 03:38:08.008208 29620 net.cpp:240] norm-label_slice-label_3_split does not need backward computation.
I0105 03:38:08.008213 29620 net.cpp:240] size-label_slice-label_2_split does not need backward computation.
I0105 03:38:08.008219 29620 net.cpp:240] pixel-label_slice-label_0_split does not need backward computation.
I0105 03:38:08.008225 29620 net.cpp:240] slice-label does not need backward computation.
I0105 03:38:08.008230 29620 net.cpp:240] type_data_2_split does not need backward computation.
I0105 03:38:08.008236 29620 net.cpp:240] data does not need backward computation.
I0105 03:38:08.008240 29620 net.cpp:283] This network produces output bb-loss
I0105 03:38:08.008244 29620 net.cpp:283] This network produces output pixel-acc
I0105 03:38:08.008249 29620 net.cpp:283] This network produces output pixel-loss
I0105 03:38:08.008255 29620 net.cpp:283] This network produces output type-acc
I0105 03:38:08.008258 29620 net.cpp:283] This network produces output type-loss
I0105 03:38:08.008304 29620 net.cpp:297] Network initialization done.
I0105 03:38:08.008309 29620 net.cpp:298] Memory required for data: 1391827220
I0105 03:38:08.008486 29620 solver.cpp:65] Solver scaffolding done.
/home/dl/VPGNet/caffe/models/vpgnet-novp/debugseokju.py(39)
@SeokjuLee When I run train.sh, the output.log print
F0112 09:20:56.458614 3586 syncedmem.cpp:58] Check failed: error == cudaSuccess (2 vs. 0) out of memory Check failure stack trace: @ 0x7fc16fc365cd google::LogMessage::Fail() @ 0x7fc16fc38433 google::LogMessage::SendToLog() @ 0x7fc16fc3615b google::LogMessage::Flush() @ 0x7fc16fc38e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fc17034b840 caffe::SyncedMemory::to_gpu() @ 0x7fc17034a829 caffe::SyncedMemory::mutable_gpu_data() @ 0x7fc170362af2 caffe::Blob<>::mutable_gpu_data() @ 0x7fc1703aaa98 caffe::PoolingLayer<>::Forward_gpu() @ 0x7fc170372d22 caffe::Net<>::ForwardFromTo() @ 0x7fc170372e47 caffe::Net<>::ForwardPrefilled() @ 0x7fc1703989dd caffe::Solver<>::Step() @ 0x7fc17039956a caffe::Solver<>::Solve() @ 0x40bf6b train() @ 0x408688 main @ 0x7fc16f2b3830 __libc_start_main @ 0x408e29 _start @ (nil) (unknown)
What does it mean? It happend too, When I train a one train picture , one test picture.
My Computer have
RAM 32G
GTX 1060
Cuda 8.0
Not use Cudnn
I solved my Problem. It was batch_size. from 64 batch_size to 5.
but I don't know how to use deploy.prototxt? (I am beginner in caffe) so please help me. I searched the site like google or the other things. and I write the code like "build/examples/cpp_classification/classification.bin models/vpgnet-novp/deploy.prototxt models/vpgnet-novp/snapshots/split_iter_500.caffemodel models/vpgnet-novp/driving_mean_train.binaryproto ./f00000.jpg"
but not happend. Plz help me
@ddori Hi, I recommend you to use PyCaffe wrapper. You can easily visualize the outputs following the instruction. https://github.com/BVLC/caffe/wiki/Using-a-Trained-Network:-Deploy
@wsyzzz Hi, did you solve the problem?
@SeokjuLee Unfortunately, I repeated several times but failed. Do you any advice? Thx.
@wsyzzz my problem is so similar to yours. I think you should loose your batch_size. the batch_size is in your solver.prototxt file
@ddori Thanks for your response. Are you sure the batch_size is in the ../caffe/models/vpgnet-novp/solver.prototxt? My solver.prototxt is like this:
net: "./train_val.prototxt"
test_iter: 20
test_interval: 100
test_compute_loss: true
base_lr: 0.005
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 10
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 2500
snapshot_prefix: "./snapshots/split"
solver_mode: GPU
And I find 'batch_size' in my train_val.prototxt. But after resetting the batch_size to 5, I still don't solve the problem.
...........
# Training input.
layer {
name: "data"
type: "DriveData"
top: "data"
top: "label"
top: "type"
data_param {
source: "./LMDB_train"
backend: LMDB
batch_size: 5
}
...........
# Test input.
layer {
name: "data"
type: "DriveData"
top: "data"
top: "label"
top: "type"
data_param {
source: "./LMDB_test"
backend: LMDB
batch_size: 10
}
...........
@wsyzzz Oh, I have a very big mistakes. Sorry.. Look at the train_val.prototxt and reset the batch_size (1 ~ 5) and try to run './train.sh' instead of 'debug_seokju.py'
@ddori Never mind. Thank you all the same. I reset the batch_size, but it still get stuck after running './train.sh'. God, I think maybe I should reinstall the OS again!
@wsyzzz Could you show me the last line of the message in the "./output/output.log"?
@SeokjuLee
.....................
0112 23:34:04.030761 10407 solver.cpp:242] Iteration 9190, loss = 2.471
I0112 23:34:04.030792 10407 solver.cpp:258] Train net output #0: bb-loss = 0.814072 (* 3 = 2.44222 loss)
I0112 23:34:04.030799 10407 solver.cpp:258] Train net output #1: pixel-loss = 0.0135676 (* 1 = 0.0135676 loss)
I0112 23:34:04.030812 10407 solver.cpp:258] Train net output #2: type-loss = 0.0152183 (* 1 = 0.0152183 loss)
I0112 23:34:04.030820 10407 solver.cpp:571] Iteration 9190, lr = 0.005
I0112 23:34:05.302136 10407 solver.cpp:346] Iteration 9200, Testing net (#0)
@wsyzzz Sorry but what was the problem? It seems normal.
That's where I feel strange too. After running './train.sh', I don't have any errors returned. And debug.py seems normal too. The point is that the program gets stuck and responses nothing. Also, i have to use Ctrl+ Z to break this state. It was shown as this: root@root:~/dl/VPGNet/caffe/models/vpgnet-novp$ sh train.sh |(Here is a stationary cursor)
Or you mean this is normal and I can ignore this? Thx.
Sorry, I am using my phone to comment. In the former comment I use './ train. sh' instead of 'sh train. sh'. But the result is same.
@wsyzzz Oh that's extremely normal. ">> ./output/output.log 2>&1" means saving outputs to the log file.
That's great! So next step is how to imply the output to detect objects in a image. I notice that 'run train.sh' is the last step on the homepage. Could you give me some advice about this step? Thx.
@SeokjuLee ,I am now reading your code . I found something strange in your code.
const float scaling = static_cast<float>(full_label_width) / param.cropped_width(); //seokju, 1 = 160/160
but the param.cropped_width() which in caffe.proto is:
optional uint32 cropped_width = 8 [default = 640];
Can you explain?
@daixiaogang @SeokjuLee hello,how to use models to test my test-data?
@daixiaogang could you give me a email? i want to discuss this model in chinese,because my english is so poor!
@SeokjuLee, I have tried to compile the caffe many times but didn't succeed. When I tried to 'make all', the following error appears as the first error:
./include/caffe/util/cudnn.hpp:124:41: error: too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’ pad_h, pad_w, stride_h, stride_w));
I am using CUDA 8.0.61 and CUDNN 6, Ubuntu 16.04, with my 1080Ti GPU. Would you mind helping me? Thanks very much!
@Lapin-Lam , you should not use cudnn in compiling caffe. In other words, you should not uncomment
# USE_CUDNN := 1
in Makefile.config.
@daixiaogang Hi, how do you solve that "Message type "caffe.LayerParameter" has no field named "tiling_param"." problem. Thanks
Please ask installation, training and test issues in this panel.