SeokjuLee / VPGNet

VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)
MIT License
487 stars 166 forks source link

Some questions about lane post-processing #8

Open vivounicorn opened 6 years ago

vivounicorn commented 6 years ago

@SeokjuLee hi,In section 4.4 of this paper,how to understand the process of point sampling:“First, we subsample local peaks from the region where the probability of lane channels from the multi-label task is high.”,such as: 1、how to find "the region"? in one of the categories's feature map (60×80) or in original image (640×480)? 2、how to understand the peaks?what does its horizontal and vertical coordinates stand for?

thank you very much.

SeokjuLee commented 6 years ago

Hi, here is a visualized description. First, you can align the image and prob map. In my case, I rescaled prob map to the size of the original image. The second plot is a three-dimensional visualization of the z-axis with probability values. Finally you can extract the local extrema from each vertical index. In this step, you can choose any filtering algorithm (fft, watershed, etc..). lane_subsample

chengm15 commented 6 years ago

@SeokjuLee I have questions below about these pics. (1) probability map comes from the result that rescale the output of multi-label 8x. (2) the 3rd pic looks like the result that we choose the local maximum value in every row as the seed point Am I right?

SeokjuLee commented 6 years ago

@chengm15 Yes, that's correct.

vivounicorn commented 6 years ago

@SeokjuLee thank you very much.

chengm15 commented 6 years ago

@SeokjuLee Can you share more detail about the local filter? How to find the local maximum value? Thanks

daixiaogang commented 6 years ago

@SeokjuLee ,As you said ,we can sample points from prob map。 How about the classify?

SeokjuLee commented 6 years ago

@chengm15 Here is an example how to use fftconvolve.

SeokjuLee commented 6 years ago

@daixiaogang Just same as the prob map, for each channel of the multi-label output.

vivounicorn commented 6 years ago

hi @SeokjuLee ,the probability of lane channels 1th from the multi-label task looks like so strange,is there some thing wrong with it? (the channels 2~5 is ok.) image

SeokjuLee commented 6 years ago

@vivounicorn The first channel represents the background class.

vivounicorn commented 6 years ago

@SeokjuLee oh oh thank you very much!

daixiaogang commented 6 years ago

@vivounicorn ,how do you get these results? use matlab?

vivounicorn commented 6 years ago

@daixiaogang use python with Axes3D.

chengm15 commented 6 years ago

@SeokjuLee I have a question about when I operate recognition? The pic below is the result of multi-label without post processing. We can see the detection of recognition is right but the recognition of the middle line has some wrong points. The first question is whether the posting processing can filter the wrong points or convert them into right points? If yes, how to operate that? According to your description and your paper, we can cluster them but how decide the recognition result of the cluster? In an other words, after we implement cluster accurately, how we determine the type of middle line, whose points have two types. image

SeokjuLee commented 6 years ago

@chengm15 First, add all the lane channel to a single binary map. In the Caltech DB, there is no road marking class, so using binary-mask output is okay. Then cluster the binary map. In your case, you would get three clusters (from your resulting image). BTW the clustered seed points have the class information from the multi-label task. Now you can vote the class type for each cluster. The type of the clustered lane is the class with the largest number of votes.

chengm15 commented 6 years ago

@SeokjuLee Thanks for your patient answer. You mean we need to choose seed point and operate cluster in the binary-mask output??? The result of image is the result of multi-label. So the specific process of posting process are (1) choose seed points according to binary-mask output by operating fftconvolve() in every row (2) convert binary-mask output and the seed points on them into IPM. (3) operate DBSCAN in IPM (4) convert the result of DBSCAN in IPM into original image (640x480) (5) convert result of multi-label into original image (640x480) (6) using vote to obtain the class of cluster (7) poly line Is there something wrong in the process?

SeokjuLee commented 6 years ago

@chengm15 Yes right. To solve the above issue, you need to cluster the lanes in the binary-mask, which covers only lane classes. The post-processing order you listed is correct.

chengm15 commented 6 years ago

@SeokjuLee Thank you so much! Besides, can you share some detail or code about inverse perspective mapping? I think it is nearly to reproduce your work.

MiaoDX commented 6 years ago

@chengm15 Hi, stuck in the post-processing, will you open-source your implement? It seems like that it is not trivial to reproduce your impl even with the provided pipeline. And it will surely saves tons of time for lots people!!

ansuman87 commented 6 years ago

@SeokjuLee @chengm15 Hi guys,

I have a few doubts regarding IPM. First, I want to confirm that the reason for performing IPM is to cluster the points near the vanishing point in the right lane group. So, the accuracy of IPM is not of utmost importance as the IPM is converted back to the the original perspective after the lane points are sorted in the right group. I ask because for IPM, most algorithms assume a flat ground which might not be an accurate assumption for all the data. But since accuracy is not that important, it is alright to employ IPM here in postprocessing.

Secondly, I want to know about the IPM algorithms you guys use. The ones that I have come across require four points in the image (that form a trapezium) to be mapped to a rectangle (in the bird's eye perspective). So, we need to identify these four points in the image and provide the coordinates of four points in the bird's eye reference frame (forming a rectangle) to compute the 'perspective transform matrix'. Using this matrix in the cv2 function 'warpPerspective' the IPM image is obtained. The issue with this method is that I have to make sure all the lane features are always enclosed within the 4 points forming the trapezium. The function 'warpPerspective' does not get applied on the whole image; it gets applied on the area enclosed by the 4 points. Finding these points for each image is not a possible option. I found it very difficult to find 4 such points reasonable enough to accommodate most of the road area in all the images, such that the lanes are always enclosed. There might be other methods to chose these points. There might be a different method altogether for IPM. I wanted to know what algorithms you guys used. Can you share some example codes with me or point me in the right direction?

Thank you!

derekwong66 commented 6 years ago

Hi @SeokjuLee , Can you explain more about how to vote the class type for each cluster? Thanks!

zanadu123 commented 5 years ago

@ansuman87 I use polar coordinates to cluster instead of IPM which is making a cartesian coordinates. The origin of polar coordinates is VP, so I only need to determine the VP position of a test image. Since the net hasn't VP branch, I use the VP position from mean image of train dataset. So, if the VP position of test image is almost same as train dataset, the inference result looks good, but if the VP position of test image is much far from train dataset, the result is bad.

peterlee909 commented 5 years ago

@chengm15 Hello, I was wondering if you could show us how to visualize the result with the multi-label. I don't even know how to get the classes of the lane. Thank you very much if you can help!

sandeepnmenon commented 3 years ago

+1 There are 18 classes in the VPGNet dataset but the model outputs 64 channels. From this discussion I see that the first channel is the background channel. How are the 18 labels mapped to the 64 channels of the model?

SeokjuLee commented 3 years ago

@sandeepnmenon Sorry for the confusion. The number of output channels, 64, was designed to include additional classes. You can set the number of your target classes such as 17.