Open mortonjt opened 4 years ago
Here is an example of this bug readout.
Traceback (most recent call last):
File "/mnt/home/jmorton/miniconda3/envs/catvae/bin/linear-vae-train.py", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/mnt/home/jmorton/research/catvae/scripts/linear-vae-train.py", line 56, in <module>
main(args)
File "/mnt/home/jmorton/research/catvae/scripts/linear-vae-train.py", line 38, in main
trainer.fit(model)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1052, in fit
self.accelerator_backend.train(model, nprocs=self.num_processes)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_spawn_backend.py", line 43, in train
mp.spawn(self.ddp_train, nprocs=nprocs, args=(self.mp_queue, model,))
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 149, in start_processes
process.start()
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
return Popen(process_obj)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 131, in reduce_tensor
storage = tensor.storage()
RuntimeError: sparse tensors do not have storage
At the moment, it can't be really done. See
https://github.com/pytorch/pytorch/issues/20248
We may have to switch to https://github.com/rusty1s/pytorch_sparse in order to really scale the sparse ops.