felipecode / coiltraine

Training framework for conditional imitation learning
MIT License
230 stars 68 forks source link

Select a dataset. #40

Closed newday233 closed 3 years ago

newday233 commented 3 years ago

Hello,

I have collect datasets using data-collector to train a model and I get a poor result .

                                                          Average Between Weathers
                                                            Task  0  ->  0.42
                                                            Task  1  ->  0.18
                                                            Task  2  ->  0.04

I also train models using the datasets (http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_14.zip) and the result is better.

                                                          Average Between Weathers
                                                            Task  0  ->  0.8600000000000001
                                                            Task  1  ->  0.48000000000000004
                                                            Task  2  ->  0.06

However, it is far away from the result in the paper.

So I think the datasets are also a key to reproduce models. Which datasets linked by this repository worked best ?

                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_01.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_02.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_03.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_04.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_05.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_06.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_07.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_08.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_09.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_10.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_11.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_12.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_13.zip
                                          http://datasets.cvc.uab.es/CVPR2019-CARLA100/CVPR2019-CARLA100_14.zip

Thanks for your help!

felipecode commented 3 years ago

Hey Newday233 ! Thanks for the interest.

It is very hard to reproduce unfortunately. The variance is huge. Just try training another random seed, that will give you probably very different results. I think I used the first 10 hours from this dataset. ( zip 1 and 2 ) Collecting data yourself might create problems since it needs to be exactly the same agent and version. You have to make sure the agent you collected the data is not making any mistakes. That can happen specially if you add some noise to the agent.

newday233 commented 3 years ago

Thanks for your reply !!!

I also download the best model and run "single drive process" five times.

Here is the results:

resnet34imnet10S1:

                                                task1 | 0.74 | 0.70 | 0.65 | 0.68 | 0.64     

                                                task2 | 0.58 | 0.60 | 0.62 | 0.52 | 0.56

                                                task3 | 0.26 | 0.28 | 0.22 | 0.26 | 0.22

resnet34imnet10S2:

                                                task1 | 0.88 | 0.88 | 0.90 | 0.88 | 0.92

                                                task2 | 0.38 | 0.40 | 0.42 | 0.40 | 0.38

                                                task3 | 0.04 | 0.06 | 0.04 | 0.10 | 0.06

The difference between "resnet34imnet10S1" and "resnet34imnet10S2" is only the "seeds".(said in the paper)

The best result in paper is :

                                                task1:  0.90±0.02

                                                task2:  0.56±0.02

                                                task3:  0.24±0.08

Is my results normal ?

I was wondering if I had made a mistake.

felipecode commented 3 years ago

Sorry, I missed this. There is no mistake, the results reported used the best seed for each task. Since it is the same model parameters just a different seed we decided to do that at that time since no one was doing variance analysis anyway. But maybe ideally we would want to report a single model. Some people already compared with us on a single model that would be the seed 1 results.