bradyz / cross_view_transformers

Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022 Oral)
MIT License
531 stars 81 forks source link

Reproducing results of paper #2

Closed F-Barto closed 2 years ago

F-Barto commented 2 years ago

Hello, many thanks for sharing the code of this awesome work !

I am trying to reproduce your results, but the config file cvt_nuscenes_vehicle.yaml differs from what is described in the paper and the training/evaluation setup of Lift Splat Shoot.

In particular:

  1. The use of the Center Loss instead of the Focal loss
  2. You use a learning rate of 4E-3 instead of 1E-2
  3. You use the visibility token from Nuscenes annotation to filter-out objects that have a visibility level strictly inferior to 2
  4. You use label_indices: [[4, 5, 6, 7, 8, 10, 11]] (7 classes) whereas the list of classes DYNAMIC contains 8 classes

Do you know how these factors influence your results?

Can you share the exact config you used to get the results in Table 1 of your paper ?

bradyz commented 2 years ago
  1. This is an auxiliary loss (focal loss on vehicle centers) used by FIERY that I tried and left in. works fine without this
  2. After major cleanup of the code for release and trying to train mixed-precision this lr seems to be more stable
  3. These objects are not visible to the ego-vehicle and need to be filtered out for correctness
  4. If you take a look the dataset generation code can the missing index is pedestrian - we included this in the label set since it could be useful for future tasks

The config is accurate - i will check to make sure nothing diverged in the release

Thanks for the questions!

F-Barto commented 2 years ago

Hello bradyz ! Many thanks for your response !

. 3. I understand that, I will clarify. My question is about how you evaluate. In your paper, to obtain the score in table 1, do you evaluate on all vehicles or only on visible vehicles ?

. 1. Ok this is as further experiments, the results in your paper were obtain using the Focal loss . 2. I see, indeed it is a nice feature to be able to train with mixed-precision . 4. Ok

bradyz commented 2 years ago

Table 1 is visible vehicles only - unfortunately almost all methods for this task evaluate slightly differently but I would argue this is the right way to do it

F-Barto commented 2 years ago

Ok I better understand now, many thanks.