fundamentalvision / BEVFormer

[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
https://arxiv.org/abs/2203.17270
Apache License 2.0
3.36k stars 546 forks source link

NCCL Error on WSL2 #230

Open samueleruffino99 opened 8 months ago

samueleruffino99 commented 8 months ago

When I am running both train and test of the model on single GPU (./tools/fp16/dist_train.sh ./projects/configs/bevformer_fp16/bevformer_tiny_fp16.py 1), I am getting this error:

RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, unhandled system error, NCCL version 2.7.8
ncclSystemError: System call (socket, malloc, munmap, etc) failed.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 439853) of binary

Do you know how to fix it? PS: I am running it on WSL2

lix19937 commented 6 months ago

close dist train.

hyygostudy commented 3 weeks ago

i have the same situation

lix19937 commented 3 weeks ago

and use bn , not sycbn