deepseek-ai / DreamCraft3D

[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
https://mrtornado24.github.io/DreamCraft3D/
MIT License
1.88k stars 81 forks source link

RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient. #47

Closed boltron1 closed 4 months ago

boltron1 commented 4 months ago

ran command for stage2: python launch.py --config custom/threestudio-dreamcraft3D/configs/dreamcraft3d-geometry.yaml --train system.prompt_processor.prompt="a cartoon boy king in robotic knight armor" data.image_path="./load/images/rey_rgba.png" system.geometry_convert_from="./outputs/dreamcraft3d-coarse-n eus/a_cartoon_boy_king_in_robotic_knight_armor@20240302-113207/ckpts/last.ckpt"

i get this error and dont understand why . any help resolving this ?

Traceback (most recent call last): File "/home/boltron/threestudio/launch.py", line 309, in main(args, extras) File "/home/boltron/threestudio/launch.py", line 252, in main trainer.fit(system, datamodule=dm, ckpt_path=cfg.resume) File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 543, in fit call._call_and_handle_interrupt( File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch return function(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 579, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in _run self.strategy.setup(self) File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/strategies/ddp.py", line 171, in setup self.configure_ddp() File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/strategies/ddp.py", line 283, in configure_ddp self.model = self._setup_model(self.model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/pytorch_lightning/strategies/ddp.py", line 195, in _setup_model return DistributedDataParallel(module=model, device_ids=device_ids, self._ddp_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 678, in init self._log_and_throw( File "/home/boltron/anaconda3/envs/threestudio/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1037, in _log_and_throw raise err_type(err_msg) RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

boltron1 commented 4 months ago

seems the issue was something in the threestudio extension version. i installed in its own env from this repo and it was able to get stage 2 to train.