EdwardLeeLPZ / PowerBEV

POWERBEV, a novel and elegant vision-based end-to-end framework that only consists of 2D convolutional layers to perform perception and forecasting of multiple objects in BEVs.
Other
82 stars 18 forks source link

predict_instance_segmentation #1

Closed BIT-MJY closed 1 year ago

BIT-MJY commented 1 year ago

Hello @EdwardLeeLPZ ,

Thanks for your great work! I have one question about the predict_instance_segmentation function in instance.py. Could you please tell me why you use output['instance_flow'][b, 1:2].detach(), rather than the first predicted instance flow [b, 0:1] to generate the instance in get_instance_segmentation_and_centers?

EdwardLeeLPZ commented 1 year ago

Hello @EdwardLeeLPZ ,

Thanks for your great work! I have one question about the predict_instance_segmentation function in instance.py. Could you please tell me why you use output['instance_flow'][b, 1:2].detach(), rather than the first predicted instance flow [b, 0:1] to generate the instance in get_instance_segmentation_and_centers?

Hello @BIT-MJY , Your question is critical. To better understand the different outputs used for post-processing, please refer to this figure: image The get_instance_segmentation_and_centers function is used for giving an initial instance segmentation of the first valid prediction frame (t=0). It is implemented in the following manner:

  1. Extract local maxima from torch.softmax(output['segmentation'], dim=2)[b, 0:1, vehicles_id].detach() as the centers of instances at the past moment (t=-1) (Line 44 - 55);
  2. Use backward centripetal flow output['instance_flow'][b, 1:2].detach() to find the corresponding position in the past frame (t=-1) to which all pixels at the moment t=0 are pointing (Line 59- 72);
  3. Compare the above positions being pointed to (t=-1) with the predicted centers of the past frame (t=-1), and assign the nearest center to each present pixel (t=0), i.e., the identification (Line 75- 77);
  4. Filter with foreground_masks[b, 1:2].detach() at moment t=0 to avoid background pixels being assigned as instances (Line 103).

It is worth noting that this step only assigns IDs to the current instances (t=0), and the centers of the past frame (t=-1) are only used as assistance, not part of the final valid prediction outputs (t=0, 1, 2, ...).

I hope I have answered your question clearly, if not, feel free to raise your question further.

BIT-MJY commented 1 year ago

@EdwardLeeLPZ Thank you so much for the detailed reply! I currently totally understand the pipeline after you explained. I also noticed that the predicted results have two more additional frames, which is defined by

self.num_waypoints = self.cfg.N_FUTURE_FRAMES + 2

in stconv.py. Thanks again and I will close this issue since it is solved.