Open yichaoshen-MS opened 2 years ago
same problem
same
Has anyone solved this?
same problem
For Minkowski Engine version 0.5.4, I tried to change MinkowskiConvolution.py as https://github.com/NVIDIA/MinkowskiEngine/pull/139 although the codes are slightly different.
Specifically, I deleted line 281 to line 285 (self,conv in the__init__()
function of class MinkowskiConvolutionBase
), and deleted line 314 to line 322 (outfeat = self.conv.apply(.....) in the forward()
function of class MinkowskiConvolutionBase
).
And I added those codes after the line 322:
if self.is_transpose:
conv = MinkowskiConvolutionTransposeFunction()
else:
conv = MinkowskiConvolutionFunction()
outfeat = conv.apply(
input.F,
self.kernel,
self.kernel_generator,
self.convolution_mode,
input.coordinate_map_key,
out_coordinate_map_key,
input._manager,
)
In this way, the 'TypeError: cannot pickle 'MinkowskiConvolutionFunction' object' seems to be fixed.
However, another error appears:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1151, in __setstate__
self.__dict__.update(state)
ValueError: dictionary update sequence element #0 has length 12; 2 is required
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1151, in __setstate__
self.__dict__.update(state)
ValueError: dictionary update sequence element #0 has length 12; 2 is required
Traceback (most recent call last):
File "/home/gaolinyao/sparsepcgc/examples/multigpu_lightning.py", line 208, in <module>
trainer.fit(pl_module)
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 113, in launch
mp.start_processes(
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/gaolinyao/anaconda3/envs/py91/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 139, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with exit code 1
Has anyone solved this?????
Describe the bug When use multi-gpu to train network by pytorch-lighting, it meets error may because "MinkowskiConvolutionFunction" can not pickle("TypeError: cannot pickle 'MinkowskiConvolutionFunction' object")
What's more, I found this PR(https://github.com/NVIDIA/MinkowskiEngine/pull/139) asserting fixing this bug and has been merged, but it seems to be not work and I cannot find the change of this PR in latest coda(-v 0.5.4).
File "/root/anaconda3/envs/mask3d/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 103, in launch mp.start_processes( File "/root/anaconda3/envs/mask3d/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes process.start() File "/root/anaconda3/envs/mask3d/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/root/anaconda3/envs/mask3d/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/root/anaconda3/envs/mask3d/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/root/anaconda3/envs/mask3d/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/root/anaconda3/envs/mask3d/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/root/anaconda3/envs/mask3d/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'MinkowskiConvolutionFunction' object
Desktop (please complete the following information):