flatironinstitute / catvae

Categorical Variational Autoencoders
BSD 3-Clause "New" or "Revised" License
24 stars 3 forks source link

Multiprocessing with sparse tensors #19

Open mortonjt opened 4 years ago

mortonjt commented 4 years ago

At the moment, it can't be really done. See

https://github.com/pytorch/pytorch/issues/20248

We may have to switch to https://github.com/rusty1s/pytorch_sparse in order to really scale the sparse ops.

mortonjt commented 3 years ago

Here is an example of this bug readout.

Traceback (most recent call last):
  File "/mnt/home/jmorton/miniconda3/envs/catvae/bin/linear-vae-train.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/mnt/home/jmorton/research/catvae/scripts/linear-vae-train.py", line 56, in <module>
    main(args)
  File "/mnt/home/jmorton/research/catvae/scripts/linear-vae-train.py", line 38, in main
    trainer.fit(model)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
    result = fn(self, *args, **kwargs)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1052, in fit
    self.accelerator_backend.train(model, nprocs=self.num_processes)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_spawn_backend.py", line 43, in train
    mp.spawn(self.ddp_train, nprocs=nprocs, args=(self.mp_queue, model,))
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 149, in start_processes
    process.start()
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/mnt/home/jmorton/miniconda3/envs/catvae/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 131, in reduce_tensor
    storage = tensor.storage()
RuntimeError: sparse tensors do not have storage