How to update one global map with multi gpu training mode?

HXMap / HRMapNet

[ECCV 2024] This is the official implementation of HRMapNet, maintaining and utilizing a low-cost global rasterized map to enhance online vectorized map perception.

https://arxiv.org/abs/2409.00620

MIT License

69 stars 5 forks source link

How to update one global map with multi gpu training mode? #7

Closed ChiefGodMan closed 2 months ago

ChiefGodMan commented 2 months ago

How to update one global map with multi gpu training mode? Currentlly each of them create self global map and update it. So this is why we need use only 1 gpu for inference?

fishmarch commented 2 months ago

For training, you can try to use torch.distributed.all_reduce to update the same global map for training with multi gpu. But this may not bring better performance. And you still need to test use 1 gpu for the best performance to utilize all past information.

To enable test with multiple gpu, the sequence needs to be re-split and assigning different gpu without overlapped testing sequences. Just like in NMP.

ChiefGodMan commented 2 months ago

Following next image(gt, pred, local_map, update_map), the obtained local map is far away a slim binary mask compared with rasterized pred result mask. Why? Have you tried show local_map during training? In my custom dataset, we might passed same way multi times, the reason are poses difference?

fishmarch commented 2 months ago

This is the results during training? For the first few epochs, the maintained map is full of noise because it merges the results from multiple frames. Take your images as example, although the prediction of this frame seems good, but maybe some other frames bring false predictions. Our current map updating method is very simple, and needs to be improved for practical usage. Moreover, you also need to check the effects of pose noise.

ChiefGodMan commented 1 month ago

Yes, it was 60 epoch training result. The history global map was not slim, the result was worst during this car turning left/right, whatever your pose are. So maybe we need to updating global map with instance mask, and we need to decreasing weight during time T passed by.