EdwardLeeLPZ / PowerBEV

POWERBEV, a novel and elegant vision-based end-to-end framework that only consists of 2D convolutional layers to perform perception and forecasting of multiple objects in BEVs.
Other
82 stars 18 forks source link

Evaluation range #8

Closed mingyuShin closed 7 months ago

mingyuShin commented 7 months ago

Thank you for your great work!

I'm a beginner in this field. When measuring evaluation ranges (i.e., short, long), shouldn't we measure both in one model and publish it in the paper? Did FIERY, for example, train two models with different resolutions for each range and measure performance?

EdwardLeeLPZ commented 7 months ago

Thank you for your great work!

I'm a beginner in this field. When measuring evaluation ranges (i.e., short, long), shouldn't we measure both in one model and publish it in the paper? Did FIERY, for example, train two models with different resolutions for each range and measure performance?

Hi,

Since the resolutions are different, the model could not handle the two range settings with the same weights. So even with the same gridmap size, it need to be trained separately. In this respect FIERY and PowerBEV are the same.

mingyuShin commented 7 months ago

Thank you for your fast reply!

mingyuShin commented 7 months ago

When I evaluated the pretrained checkpoint of FIERY directly, I ran inference for 1 epoch (5119 iterations), and both short and long values were outputted simultaneously, matching the numerical values in the paper. Upon inspecting FIERY's evaluation.py code, it appears that a single model infers 100m x 100m regions and then evaluates them by dividing them into 100m x 100m and 30m x 30m sections. This differs slightly from the reporting method of PowerBEV. I'm just seeking clarification. The evaluation is a little different, right?

mingyuShin commented 7 months ago

I measured the performance of a pretrained model that has a grid resolution of 0.5m over an area of 100mx100m on a smaller area of 30mx30m. The performance results are as follows:


Testing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 5119/5119 [1:00:16<00:00,  1.42it/s]
========================== Metrics ==========================
val_iou_background: 0.979128360748291
val_iou_dynamic: 0.618949830532074
val_pq_dynamic: 0.5224682688713074
val_sq_dynamic: 0.7617481350898743
val_rq_dynamic: 0.6858805418014526
val_denominator_dynamic: 85233.5
========================== Runtime ==========================
perception_time: 0.6085988915869898
prediction_time: 0.030737586728375215
postprocessing_time: 0.021788783454368073
total_time: 0.6611252617697331
=============================================================
Testing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 5119/5119 [1:00:17<00:00,  1.41it/s]
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_loss/flow_uncertainty': -1.498396635055542,
 'test_loss/instance_flow': 2.249345541000366,
 'test_loss/segmentation': 2.8715157508850098,
 'test_loss/segmentation_uncertainty': 1.2302122116088867,
 'vpq': 0.5224682688713074}
--------------------------------------------------------------------------------
EdwardLeeLPZ commented 7 months ago

When I evaluated the pretrained checkpoint of FIERY directly, I ran inference for 1 epoch (5119 iterations), and both short and long values were outputted simultaneously, matching the numerical values in the paper. Upon inspecting FIERY's evaluation.py code, it appears that a single model infers 100m x 100m regions and then evaluates them by dividing them into 100m x 100m and 30m x 30m sections. This differs slightly from the reporting method of PowerBEV. I'm just seeking clarification. The evaluation is a little different, right?

Thanks for the correction. Yes, you are right. I re-compared our evaluation with FIERY's. The original FIERY used a strategy of single-range prediction and then trimming the results. And the FIERY‡ (repr.) that we reimplemented in our paper uses the same evaluation strategy as the existing PowerBEV. Although I don't think this will make an essential difference, for a fair comparison I would recommend that you follow FIERY's evaluation strategy as well as for PowerBEV.

mingyuShin commented 7 months ago

Thank you for sharing your opinion. Since it is mentioned FIERY‡ (repr.) in the implementation section of your paper, I also think there should be no problem. Thank you for the quick response!