SeokjuLee / VPGNet

VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)
MIT License
486 stars 165 forks source link

About post-processing for lane visualization #5

Open SeokjuLee opened 6 years ago

SeokjuLee commented 6 years ago

Here is an additional explanation about the lane post-processing. There are four steps basically. First, sampling seed points with the lane heat map from the multi-label task. Then, cluster the seeds in the IPM-ed coordinate with our clustering method (“We sequentially decide the cluster by the pixel distance. After sorting the points by the vertical index, we stack the point in a bin if there is a close point among the top of the existing bins. Otherwise, we create a new bin for a new cluster. By doing this, we can reduce the time complexity of the clustering.”). Lastly, we did polynomial line fitting for each cluster. If the cluster is near the VP, we include that VP while clustering for the stability. Related contents are described in our paper, Section 4.4. lane-pp

daixiaogang commented 6 years ago

@SeokjuLee ,can you write a demo that use the deploy.prototxt by python? Because I do not know whether the mean value is needed or not in this step.When I add mean.npy to net,it got wrong,because the shape(3,512,672) can not convert to (3,480,640).If I do not use mean.npy ,the output of muiltlabel is (1,64,60,80),which can be refered as (64,60,80) and the value is almost 1 in some line and very small value in orther lines.

SeokjuLee commented 6 years ago

@daixiaogang Of course subtract mean value. I just extracted mean value (RGB) from the mean.npy and then subtract (transformer.set_mean) it to the input. The multi_label and binary_mask outputs have values between 0~1 because they are softmax outputs.

daixiaogang commented 6 years ago

@SeokjuLee ,did you use the np.subtract(a,b)? when I use it ,it got wrong.Because the mean value from mean.npy is (3,512,672),the input image is (3,480,640).

SeokjuLee commented 6 years ago

@daixiaogang Oh just resize the mean array to 480x640. I used transformer.set_mean() for subtraction.

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', MEAN) # subtract
transformer.set_raw_scale('data', 255) # [0,255] -> [0,1]
transformer.set_channel_swap('data', (2,1,0)) # RGB -> BGR
daixiaogang commented 6 years ago

@SeokjuLee ,thanks for your patient explaination! I have try to resize the mean ,but it got wrong.

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) transformer.set_transpose('data', (2,0,1)) Mean = np.load('./mean.npy') Mean = Mean.reshape(3,480,640)#wrong transformer.set_raw_scale('data', 255) transformer.set_channel_swap('data', (2,1,0)) # RGB -> BGR

which said:ValueError: total size of new array must be unchanged I think this may be caused by that you resize you data for training(3,512,672) and MEAN is caculated by this size.

SeokjuLee commented 6 years ago

@daixiaogang Numpy reshape function doesn't change the size. Try scipy.misc.imresize or PIL's resize functions

daixiaogang commented 6 years ago

@SeokjuLee ,I have do IPM and get the output ,can you give me some advice on how to do DBSCAN using the ouput?

SeokjuLee commented 6 years ago

@daixiaogang Well, I recommend you not to use the original DBSCAN codes because they are slow. The following description is from the original paper, section 4.4. Please refer it. "We sequentially decide the cluster by the pixel distance. After sorting the points by the vertical index, we stack the point in a bin if there is a close point among the top of the existing bins. Otherwise, we create a new bin for a new cluster. By doing this, we can reduce the time complexity of the clustering..."

ivo-gilles commented 6 years ago

Hi, I am new to Caffe and want to deploy the model in C++, not python. At this time I don't know how to get the mean value to subtract in the beginning of the feedforward step. If possible, could you give me your mean.npy file or better an advice the approximate mean value for me can use directly in the code? My email is tin.duongtrung@gmail.com. Thank you.

HuifangZJU commented 6 years ago

Hi, the output channel of Multi-label task is 64, so which one is the lane channel?

daixiaogang commented 6 years ago

@ln-scau ,I am have not do the post proprcessing. I think we can communicate by medium tools,such as webchat.

chengm15 commented 6 years ago

@daixiaogang Hi, Have you solve the problem of mean value? I think the code blew can help.

MEAN_PROTO_PATH = '/home/chengming/chengming/VPGNet/caffe/models/vpgnet-novp/driving_mean_train.binaryproto'               
MEAN_NPY_PATH = '/home/chengming/chengming/VPGNet/caffe/models/vpgnet-novp/mean.npy'                         

blob = caffe.proto.caffe_pb2.BlobProto()           
data = open(MEAN_PROTO_PATH, 'rb' ).read()         
blob.ParseFromString(data)                         

array = np.array(caffe.io.blobproto_to_array(blob))
mean = array[0]                                
np.save(MEAN_NPY_PATH ,mean)
print(mean.shape)
mu = mean.mean(1).mean(1) 

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))

Besides, I want to ask how you implement IPM. my email is chengm15@163.com. I think we also can communicate by wechat.

daixiaogang commented 6 years ago

@chengm15 ,I download the IPM code from this URL:http://blog.csdn.net/yeyang911/article/details/51912322,my QQ is 750930459

chengm15 commented 6 years ago

@SeokjuLee Hello, I have implement the training and the following output.log seems have a good accuracy result.

Test net output #0: bb-loss = 0.843927 (* 3 = 2.53178 loss)   
Test net output #1: pixel-acc = 0.982325                      
Test net output #2: pixel-loss = 0.0884257 (* 1 = 0.0884257 loss)
Test net output #3: type-acc = 0.973547                       
Test net output #4: type-loss = 0.183663 (* 1 = 0.183663 loss)

Thus, I choose the f00280 in cordova2(cordova1, washington1, washington2 are the training data and cordova2 is the test data) to look the result of model. However, the result is very bad This is the label: image This model's result: image Different colors means different type of lanes. And the other pic has the similar result as this one so it is not a special case There may exist some reasons of the poor result:

  1. this model do not have VP, so the performance is poor. But I can not understand why high accuracy produce a poor result.
  2. We do not use post process.
  3. I need to fine tune the model. Can you explain this poor result or share more information about this work?

Looking forward to your reply!

SeokjuLee commented 6 years ago

@chengm15 Hi, I think the main reason is the deficiency of the training dataset, and it looks like the output is not well aligned with the original image. You need to shift it as below (depending on the offset and output size).

multi_label = net.blobs['multi-label'].data[0]
multi_label_shifted = np.zeros(multi_label.shape)
multi_label_shifted[:,2:,2:] = multi_label[:,:-2,:-2]
SeokjuLee commented 6 years ago

@chengm15 Furthermore, the accuracy you checked includes the background pixels, which means that most pixels are background class and this causes a high accuracy.

chengm15 commented 6 years ago

@SeokjuLee Thanks for replying me! I modify two places

  1. using skimage.transform.resize to resize the driving_mean_train.binaryproto and result is better than before. the following is the result after using skimage.transform.resize. I can see same number of lanes as the label. image

  2. According to your advice, I shift the multi-label result and the result is better. Why this shifting operation can better the result? Is shifting operation a general condition? and Is the offset value same in different dataset? I do not read any details about the shifting operation so I have many questions. image

Besides, both of the two images has a problem: the rightmost lane is wrong. Can you give some advices for this problem? Thank you very much!

SeokjuLee commented 6 years ago

@chengm15 Good, the reason of the shifting is the padded value. You need to align the output according to the padding operation. In addition, I think the misclassification is because of the data imbalance. The number of dashed lines would be larger than the number of single lines, which causes the biased misclassification. One solution of this could be making a class weighting as SegNet has done. Further, I recommend you to flip the training dataset left and right (data augmentation). This also can mitigate the bias problem.

daixiaogang commented 6 years ago

@SeokjuLee ,I want train your net by my dataset which have no class information and only the line point. And my image is 1280x1024,can I only change the input size in train_val.protxt?

chengm15 commented 6 years ago

@SeokjuLee Thanks for your guidance! Right Now, The result is much better than before. image And I want to make sure the process of posting process. Firstly, I implement IPM. Secondly, I operate clustering method. Then, back clustering method to original image. Finally, implement polyfit. Am I right?

SeokjuLee commented 6 years ago

@daixiaogang I think the easiest way is to make splines as the Caltech dataset and set loss_weight for multi-label task as zero. You don't need to fix the input size because the network is fully convolutional.

SeokjuLee commented 6 years ago

@chengm15 Good, yeah that's right.

daixiaogang commented 6 years ago

@SeokjuLee ,I have got your idea,but the data input to network by convert-drivingdata is (640,480),the ouput is subsample by 8 (80,60).

SeokjuLee commented 6 years ago

@daixiaogang Yes you may need to change the input/converting layer.

chengm15 commented 6 years ago

@SeokjuLee Hi, This is the newest result after post processing but it is far from the result in your video. But the right line is not correct. Can you give some advice for the result? image And I have used data augmentation. I flip the image and label. The type accuracy only 95% and can not be improved further. It leads that all of output are background.

SeokjuLee commented 6 years ago

@chengm15 How did you sample the seed points? Did you shift the heat map along the y-axis?

chengm15 commented 6 years ago

@SeokjuLee just randomly pick 8 points in the gird.

SeokjuLee commented 6 years ago

@chengm15 In my case, I used 'scipy.signal.fftconvolve' for each row and then extracted local maxima.

chengm15 commented 6 years ago

@SeokjuLee Thanks for replying me. I am still a little confused about the cluster method. Do you only convert seed point into bird view? Then sorting the seed points according to the vertical index. Next, using DBSCAN to cluster the point. Finally, back to the original image and fit the line. Am I right?

SeokjuLee commented 6 years ago

@chengm15 Yes that's right. IPM is only used to separate the seed points near the VP.

chengm15 commented 6 years ago

@SeokjuLee Thanks. Besides, I have a question about the data augmentation. After using data augmentation, the result become much worse than before in which all of gird is recognized by background. Can you share a .caffemodel file? It can be more convenient to follow your work.

SeokjuLee commented 6 years ago

@chengm15 From my experience, most errors are caused by the data layer. Or, that result could be correct because the Caltech dataset is too small. I'm afraid I can't upload the model file because it needs additional permission from Samsung Research :-(

daixiaogang commented 6 years ago

@SeokjuLee ,I want to know when will you open the dataset. When I use my dataset to train this network ,it can detect nothing .I think this may be caused by my label is not very good which has not enough grid box because the source dataset have only two end point of line which make a short line.

daixiaogang commented 6 years ago

@SeokjuLee : the following code is copied from you data_layer.cpp const int grid_dim = param.label_resolution(); //seokju, 8 const int width = param.tiling_width(); //seokju, 20 const int height = param.tiling_height(); //seokju, 15 const int full_label_width = width grid_dim; //seokju, 160 const int full_label_height = height grid_dim; //seokju, 120 I want to know whether I can change these parameter or not ? such as if I set grid_dim=16 in annonate code the label is better . the width and heigh of label with no grid also changes as the resolution of picture (480x640 vs 1024 x1280). Because If I do not change the label code ,I can not detect lane from picture with my dataset. The following is the visual of label (grid_dim=8 vs grid_dim=16)

_20180118211711(grid_dim=8) _20180118211730(grid_dim=16) Because the grid label is proposed by you the first time , so I want to know more details . Hope for your reply!

daixiaogang commented 6 years ago

@SeokjuLee ,As your paper said, the resolution of your image is 1288x728, but the network input is 640x480. I wonder how did you change the convertdrivingdata.cpp( the original resize_weight and resize_height is 640+32and 480+32), have you change the resize-weight and resize_height? In the train_val.protxt , what does the scale parameter mean?As we use the caltech dataset,the scale is 1,which the image resolution equals input size in width and height. Hope for your reply!

SeokjuLee commented 6 years ago

@daixiaogang We just resize the image to 640x480 in the data pre-processing. The number of 32 is the padded offset. The parameters I changed are declared in the prototxt file, the drive_data_param structure. The scale parameter is for the rescaling pixel intensity, which deosn't need to be tuned in your case. As you mentioned previously, you need to change the parameters in the data layer like the case of the grid_dim.

chengm15 commented 6 years ago

@SeokjuLee According to your description : “In my case, I used 'scipy.signal.fftconvolve' for each row and then extracted local maxima”. I look up the information in https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.signal.fftconvolve.html scipy.signal.fftconvolve(in1, in2, mode='full') the input of scipy.signal.fftconvolve is in1 , in2 and mode. What are these 3 parameters responding to ? Looking forward to your relpy

SeokjuLee commented 6 years ago

@chengm15 I have an additional explanation about the subsampling with a visualization. Please see this issue.

billyzju commented 6 years ago

@chengm15 Could you please just share your post processing codes? Many thanks!

shanmugaraj1986 commented 6 years ago

@SeokjuLee and @chengm15, I trained VPGNet model using caltech dataset, I dont know how to do post processing, I need your assitance to get the post processing output. Thanks

OrkunYilmaz commented 5 years ago

Dear @chengm15 and @daixiaogang would you mind sharing your post-processing codes? I am working on a thesis and would much appreciate your help!

yurenwei commented 5 years ago

@chengm15 I trained VPGNet model using caltech dataset, I dont know how to do post processing, I need your assitance to get the post processing output. Thanks

yurenwei commented 5 years ago

@daixiaogang I trained VPGNet model using caltech dataset, I dont know how to do post processing, I need your assitance to get the post processing output. would you mind sharing your post-processing codes? My email is yurenwei2014@163.com.Thanks.

karanbehar commented 5 years ago

Dear Orkun

Were you able to figure out the post processing and visualization with VPGNet? I am trying to test the network on an unlabeled bad weather data. However I have still not figured out, how to perform testing with the images.

Kindly advise.

With best regards,

Karan

sandeepnmenon commented 3 years ago

+1 Need help with the post processing code. If that could be shared it will be really helpful My email is menonsandu@gmail.com