cguindel / eval_kitti

Tools to evaluate object detection results using the KITTI dataset.
57 stars 23 forks source link

Could you show us your build directory as tree structure ? #14

Closed hoangduyloc closed 3 years ago

hoangduyloc commented 3 years ago

I follow your instruction, but I got all zero score image Do you know the problem? Thank you.

cguindel commented 3 years ago

Currently, the evaluator expects this structure:

|-- build
|   |-- data
|   |   |-- object
|   |   |   |-- label_2 [contains the KITTI labels]
|   |-- lists [contains train/val txt files]
|   |-- results
|   |   |-- <experiment_name>
|   |   |   |-- data [contains txt results in KITTI format]

So you should place the detection txt files in results/exp1/data. Please note that the KITTI evaluator needs that every detected instance has a 2D bounding box, even though you only want to perform BEV/3D evaluation, and 2D bounding boxes should be at least 25 pixels in height so that the detection is not ignored.

hoangduyloc commented 3 years ago

Thanks for your fast reply. My tree structure same as you also. First I test all the GT and DT are the same. but I don't know why the code can only read GT, not DET ( Zero for all). Here is my tree directory, test_10.txt for testing 10 GT and 10 DT. Even when I set to a larger set it also set all Zero to the DT (Image in the first question). So I wonder is there any problem with the DT code or format types. My environment (Ubuntu 20.04 / Python3.8 / Cmake 3.16.3) image

cguindel commented 3 years ago

Let me guess, are your files in results/exp1/data identical to the ones in object/label_2? That would be a problem because detections need an additional column containing the detection score, so the evaluator expects 16 fields per row instead of 15.

hoangduyloc commented 3 years ago

Let me guess, are your files in results/exp1/data identical to the ones in object/label_2? That would be a problem because detections need an additional column containing the detection score, so the evaluator expects 16 fields per row instead of 15.

Yeah, your guess is so right. I missed the object confidence score at 16 columns. Anyway, Thank you so much.

cguindel commented 3 years ago

No problem, I am aware that there is room for improvement, also in the documentation (contributions are welcome).

If you have solved the problem, I would appreciate it if you would consider closing the issue. I hope it can be a useful reference for future users, though.

hoangduyloc commented 3 years ago

@cguindel Hi, I have one more question. Have you ever submitted the result to the official Kitti Benchmark. When I evaluate using this source code, the accuracy is so high > 85% in valset 50-50 split, but when evaluating on the official benchmark the accuracy around 65%, very big trade-off here! Do you think that this source code has problems?

cguindel commented 3 years ago

We have also observed a significant decrease in AP when evaluating on the testing set vs. the usual validation sets (for instance, check this), although not as huge as the one you mention. Are you using a train/val split that ensures that training and validation images do not come from the same video sequences, such as the one by Chen et al.?

The backbone of the source code in this repository is the official evaluation script included in the KITTI detection devkit, which is reportedly the same one used in the official benchmark, so it is highly unlikely that the differences are due to problems in the evaluation code. Rather, I would blame this disparity on the differences in the composition of the validation and testing sets.

hoangduyloc commented 3 years ago

We have also observed a significant decrease in AP when evaluating on the testing set vs. the usual validation sets (for instance, check this), although not as huge as the one you mention. Are you using a train/val split that ensures that training and validation images do not come from the same video sequences, such as the one by Chen et al.?

The backbone of the source code in this repository is the official evaluation script included in the KITTI detection devkit, which is reportedly the same one used in the official benchmark, so it is highly unlikely that the differences are due to problems in the evaluation code. Rather, I would blame this disparity on the differences in the composition of the validation and testing sets.

Thanks for your useful information, It's really nice to take with the BirdNet, and BirdNet++ Author. I split the Train/Val 80/20 randomly for the experiment for the first time, when I check the accuracy is so high:

For ValSet Only (20% -maybe it will have the overlap sequence) via your GitHub source code car_detection AP (%): 96.10 / 94.90 / 95.28 car_orientation AOS (%): 96.09 / 94.88 / 95.15 car_detection_ground AP (%): 97.86 / 95.71 / 93.60 car_detection_3d AP (%): 96.29 / 92.09 / 92.41

pedestrian_detection AP (%): 72.41 / 74.55 / 73.18 pedestrian_orientation AOS (%): 61.21 / 65.38 / 65.04 pedestrian_detection_ground AP (%): 86.13 / 84.55 / 80.42 pedestrian_detection_3d AP (%): 80.91 / 80.04 / 76.20

cyclist_detection AP (%): 94.48 / 90.26 / 92.70 cyclist_orientation AOS (%): 93.57 / 89.56 / 91.99 cyclist_detection_ground AP (%): 83.49 / 83.83 / 84.19 cyclist_detection_3d AP (%): 83.49 / 83.83 / 84.19

For the offical benmark: Car (Detection) | 86.89 % | 77.90 % | 72.05 % Car (Orientation) | 86.60 % | 76.83 % | 70.95 % Car (3D Detection) | 64.68 % | 52.16 % | 48.17 % Car (Bird's Eye View) | 80.56 % | 70.69 % | 65.31 %

Pedestrian (Detection) | 41.80 % | 31.51 % | 29.76 % Pedestrian (Orientation) | 22.87 % | 17.69 % | 16.56 % Pedestrian (3D Detection) | 26.46 % | 21.03 % | 18.40 % Pedestrian (Bird's Eye View) | 33.01 % | 25.44 % | 23.64 %

Cyclist (Detection) | 38.28 % | 28.05 % | 26.98 % Cyclist (Orientation) | 31.91 % | 22.84 % | 21.78 % Cyclist (3D Detection) | 24.83 % | 17.46 % | 16.73 % Cyclist (Bird's Eye View) | 28.35 % | 20.21 % | 19.31 %

That's a big difference. I tested 50/50 follow Chen et al. also, It also achieves high precision in valset also around 85% (I don't have the concrete metric here). I will investigate more about my model to see what's wrong! Thank for your useful information again!