Hi, when I use 2 gpus in one node to train train_cmkd.py, the error like:
2022-11-25 09:49:08,244 INFO **Start training xxx/project_3D/CMKD-main/tools/cfgs/kitti_models/CMKD/cmkd_caic_R50_scd_V2(default)**
epochs: 0%| | 0/10 [00:00<?, ?it/s]
epochs: 0%| | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/xxx/project_3D/CMKD-main/tools/train_cmkd.py", line 226, in
main()
File "/xxx/project_3D/CMKD-main/tools/train_cmkd.py", line 198, in main
merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch,
File "/xxx/project_3D/CMKD-main/tools/train_utils/train_utils.py", line 245, in train_model_cmkd
dataloader_iter = iter(train_loader)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 355, in iter
return self._get_iterator()
File "/root/anaconda3/envs/CMKDK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 914, in init
w.start()
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/popen_fork.py", line 20, in init
self._launch(process_obj)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle spconv.core_cc.csrc.sparse.all.ops_cpu3d.Point2VoxelCPU objects
I start the code with:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train_cmkd.py --launcher pytorch --cfg ${CONFIG_FILE} --tcp_port 16677 --pretrained_lidar_model ${TEACHER_MODEL_PATH}
I use torch==1.8.1-cu111, spconv-cu111==2.1.21, which work well.
You may also take a look at this link.
If these do not work for you, try spconv 1.x instead of 2.x.
Hi, when I use 2 gpus in one node to train train_cmkd.py, the error like: 2022-11-25 09:49:08,244 INFO **Start training xxx/project_3D/CMKD-main/tools/cfgs/kitti_models/CMKD/cmkd_caic_R50_scd_V2(default)**
epochs: 0%| | 0/10 [00:00<?, ?it/s] epochs: 0%| | 0/10 [00:00<?, ?it/s] Traceback (most recent call last): File "/xxx/project_3D/CMKD-main/tools/train_cmkd.py", line 226, in
main()
File "/xxx/project_3D/CMKD-main/tools/train_cmkd.py", line 198, in main
merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch,
File "/xxx/project_3D/CMKD-main/tools/train_utils/train_utils.py", line 245, in train_model_cmkd
dataloader_iter = iter(train_loader)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 355, in iter
return self._get_iterator()
File "/root/anaconda3/envs/CMKDK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 914, in init
w.start()
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/popen_fork.py", line 20, in init
self._launch(process_obj)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/root/anaconda3/envs/CMKDK/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle spconv.core_cc.csrc.sparse.all.ops_cpu3d.Point2VoxelCPU objects
I start the code with: CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train_cmkd.py --launcher pytorch --cfg ${CONFIG_FILE} --tcp_port 16677 --pretrained_lidar_model ${TEACHER_MODEL_PATH}
when I use 1 gpu to train , it's ok! That's why?