Closed ChiefGodMan closed 2 months ago
For training, you can try to use torch.distributed.all_reduce to update the same global map for training with multi gpu. But this may not bring better performance. And you still need to test use 1 gpu for the best performance to utilize all past information.
To enable test with multiple gpu, the sequence needs to be re-split and assigning different gpu without overlapped testing sequences. Just like in NMP.
Following next image(gt, pred, local_map, update_map), the obtained local map is far away a slim binary mask compared with rasterized pred result mask. Why? Have you tried show local_map during training? In my custom dataset, we might passed same way multi times, the reason are poses difference?
This is the results during training? For the first few epochs, the maintained map is full of noise because it merges the results from multiple frames. Take your images as example, although the prediction of this frame seems good, but maybe some other frames bring false predictions. Our current map updating method is very simple, and needs to be improved for practical usage. Moreover, you also need to check the effects of pose noise.
Yes, it was 60 epoch training result. The history global map was not slim, the result was worst during this car turning left/right, whatever your pose are. So maybe we need to updating global map with instance mask, and we need to decreasing weight during time T passed by.
How to update one global map with multi gpu training mode? Currentlly each of them create self global map and update it. So this is why we need use only 1 gpu for inference?