DSD-DBS / raillabel

A devkit for working with recorded and annotated train ride data from Deutsche Bahn.
Apache License 2.0
21 stars 7 forks source link

Question about dataset division. #49

Open yinheyanxian opened 4 months ago

yinheyanxian commented 4 months ago

Dear authors,  I would like to ask some questions about OSDaR23 dataset division. Our goal is to train the object detection algorithm based solely on LIDAR.  After extracting the data, 1524 trainable samples are obtained (not 1534 as there are no cuboid annotations in 19_vegetation_curve_19.1). Afterwards, I followed the dataset division outlined in the official article "OSDaR23: Open Sensor Data for Rail 2023" (DOI: 10.1109/ICRAE59816.2023.10458449) and generated training, validation, and testing sets. My detection categories are person, bicycle, train, road_vehicle, animal, and crowd.

f28fcecbdbe05fdb126d072630a2960  However, when I train classic algorithms such as CenterPoint, VoxelNeXt, and PointRCNN with this data, the results are not good. Specifically, the detection accuracy of person and road_vehicle on the test set is acceptable, while the detection accuracy of bicycle, animal, and train categories on the test set is almost 0 (there is no crowd category in the test set). Similarly, the detection accuracy of person and road_vehicle on the validation set is still good, with a detection accuracy of 10%+for train and 0 for crowd (there are no bicycle or animal categories in the test set). This is a very strange phenomenon because the number of samples in the training set for animal, train, and road_vehicle is very close, but the final detection accuracy differs greatly. There seems to be a serious underfitting phenomenon.  I have tried many deep learning optimization methods, such as increasing the number of training epochs, using data augmentation, increasing network depth, etc., but the final results have not changed significantly. The detection accuracy of cycles, trains, and animals on the test set is still almost 0.  Afterwards, I adopted a custom dataset division by placing 1524 training samples together for uniform random sampling, with 70% of the samples used for training and 15% for validation and testing. Under custom dataset division, the trained algorithm performs better, with detection accuracy exceeding 10% for each category. However, I noticed that there were many train stopping scenes in the original 1524 samples, which had high repeatability. The custom dataset division divided some of the duplicated scenes into the training set, which achieved better results.  In my opinion, the LIDAR data scenes provided by OSDaR23 data set are rich, but the number of samples is too small. It seems that training the model according to the dataset partitioning in the official article cannot achieve sufficient feature learning.
 Can the official give me some advice? I am eager to know the reasons why the model I have trained performs poorly.  

yinheyanxian commented 4 months ago

I did some more experiments, and I found that the train class on the validation set always had a certain precision, while the train on the test set always had a precision of 0.