Open mrabiabrn opened 1 week ago
33.66 is tested with 224x400 inputs, which are pre-downsampled and upsampled to keep the same pipeline as we test the generated views.
36.00 is from the original settings of CVT, using the raw data from the validation set. Our reproduction matched this performance.
So the evaluation result of 36.00 corresponds to the 224x448 resolution (as in the original paper), using the raw nuScenes validation set. The result of 33.66 was obtained by evaluating at 224x400 (model was trained on 224x448). For Table 3, the model was trained at 224x448 using a mixed dataset (real + generated).
If the original CVT setting is 224x448, then yes (I forgot the details of CVT).
We did not change the original data processing pipeline of CVT, which loads from 900x1600 images. Therefore, to use generated views, we upsample and pad to 900x1600 and then go through CVT. Oracle is obtained with 900x1600 -> 224x400 -> 900x1600 -> CVT.
Thank you for the clarification 👍🏻
This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.
Hi, When I checked the CVT results in Table 1 and Table 3, I noticed a discrepancy. In Table 1, the oracle performance for vehicle segmentation is reported as 33.66, which I understood to be your reproduction of CVT. However, in Table 3, the performance is listed as 36.0, consistent with the original paper. Additionally, it seems that the augmentation performance in Table 3 is added on top of the originally reported results.
Does the increase comes from 33.66 or 36.0? Could you clarify this?
Thank you