3dlg-hcvc / omages

We present Object Images (Omages): An homage to the classic Geometry Images.
MIT License
257 stars 12 forks source link

encounter problems when training geo2mat #11

Open ET823828 opened 3 weeks ago

ET823828 commented 3 weeks ago

Hello! Really appreciate your outstanding work!
However, when I try to retrain geo2mat, I encounter this problem:

Time stamp: #5 save blend and glbs
524 0.06638479232788086 0.09188485145568848
525 [0.01105189 0.00168228 0.00738311 0.00538278 0.06638479]
526 5 6.6385e-02 save blend and glbs
527 1 1.1052e-02 loading preset blend
528 3 7.3831e-03 create mesh in blender
529 4 5.3828e-03 create material
530 2 1.6823e-03 meshing omage
531 output_path /pfs/mt-1oY5F7/.temp/xgutils/K5UFT52DEJ.png
532 default samples 4096
533 File format: PNG
534 Render complete. Image saved to: /pfs/mt-1oY5F7/.temp/xgutils/K5UFT52DEJ.png
535 (512, 512, 4) (512, 768, 4)
536 img_per_row 5
537 Epoch 48:   3%|   | 1/32 [00:00<00:29,  1.04it/s, v_num=c3p9, train/loss=0.00358, val/loss=0.00404]
538   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/runpy.py", line 196, in _run_module_as_main
539     return _run_code(code, main_globals, None,
540   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/runpy.py", line 86, in _run_code
541     exec(code, run_globals)
542   File "/pfs/mt-1oY5F7/omages/src/trainer.py", line 385, in <module>
543     func()
544   File "/pfs/mt-1oY5F7/omages/src/trainer.py", line 252, in train
545     self.trainer.fit(self.model, datamodule=self.data_module, ckpt_path = resume_from)
546   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
547     call._call_and_handle_interrupt(
548   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
549     return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
550   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
551     return function(*args, **kwargs)
552   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
553     self._run(model, ckpt_path=ckpt_path)
554   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
555     results = self._run_stage()
556   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage
557     self.fit_loop.run()
558   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run
559     self.advance()
560   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance
561     self.epoch_loop.run(self._data_fetcher)
562   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run
563     self.advance(data_fetcher)
564   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 250, in advance
565     batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
566   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 190, in run
567     self._optimizer_step(batch_idx, closure)
568   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 268, in _optimizer_step
569     call._call_lightning_module_hook(
570   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
571     output = fn(*args, **kwargs)
572   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1303, in optimizer_step
573     optimizer.step(closure=optimizer_closure)
574   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 152, in step
575     step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
576   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 270, in optimizer_step
577     optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
578   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 239, in optimizer_step
579     return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
580   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step
581     return optimizer.step(closure=closure, **kwargs)
582   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
583     out = func(*args, **kwargs)
584   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
585     ret = func(self, *args, **kwargs)
586   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/optim/adam.py", line 121, in step
587     loss = closure()
588   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 108, in _wrap_closure
589     closure_result = closure()
590   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 144, in __call__
591     self._result = self.closure(*args, **kwargs)
592   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
593     return func(*args, **kwargs)
594   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 138, in closure
595     self._backward_fn(step_output.closure_loss)
596   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 239, in backward_fn
597     call._call_strategy_hook(self.trainer, "backward", loss, optimizer)
598   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
599     output = fn(*args, **kwargs)
600   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 213, in backward
601     self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
602   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 72, in backward
603     model.backward(tensor, *args, **kwargs)
604   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1090, in backward
605     loss.backward(*args, **kwargs)
606   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
607     torch.autograd.backward(
608   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
609     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
610 RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
610 RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
611 Traceback (most recent call last):
612     return _run_code(code, main_globals, None,
613   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/runpy.py", line 86, in _run_code
614     exec(code, run_globals)
615   File "/pfs/mt-1oY5F7/omages/src/trainer.py", line 385, in <module>
616     func()
617   File "/pfs/mt-1oY5F7/omages/src/trainer.py", line 252, in train
618     self.trainer.fit(self.model, datamodule=self.data_module, ckpt_path = resume_from)
619   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
620     call._call_and_handle_interrupt(
621   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt
622     return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
623   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
624     return function(*args, **kwargs)
625   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
626     self._run(model, ckpt_path=ckpt_path)
627   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
628     results = self._run_stage()
629   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage
630     self.fit_loop.run()
631   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run
632     self.advance()
633   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance
634     self.epoch_loop.run(self._data_fetcher)
635   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run
636     self.advance(data_fetcher)
637   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 250, in advance
638     batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
639   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 190, in run
640     self._optimizer_step(batch_idx, closure)
641   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 268, in _optimizer_step
642     call._call_lightning_module_hook(
643   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 157, in _call_lightning_module_hook
644     output = fn(*args, **kwargs)
645   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1303, in optimizer_step
646     optimizer.step(closure=optimizer_closure)
647   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 152, in step
648     step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
649   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/strategies/ddp.py", line 270, in optimizer_step
650     optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
651   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 239, in optimizer_step
652     return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
653   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 122, in optimizer_step
654     return optimizer.step(closure=closure, **kwargs)
655   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
656     out = func(*args, **kwargs)
657   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
658     ret = func(self, *args, **kwargs)
659   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/optim/adam.py", line 121, in step
660     loss = closure()
661   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 108, in _wrap_closure
662     closure_result = closure()
663   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 144, in __call__
664     self._result = self.closure(*args, **kwargs)
665   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
666     return func(*args, **kwargs)
667   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 138, in closure
668     self._backward_fn(step_output.closure_loss)
669   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 239, in backward_fn
670     call._call_strategy_hook(self.trainer, "backward", loss, optimizer)
671   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
672     output = fn(*args, **kwargs)
673   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 213, in backward
674     self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *args, **kwargs)
675   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/precision.py", line 72, in backward
676     model.backward(tensor, *args, **kwargs)
677   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1090, in backward
678     loss.backward(*args, **kwargs)
679   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
680     torch.autograd.backward(
681   File "/pfs/mt-1oY5F7/mambaforge/envs/dlt/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
682     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
683 RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

It seems that when doing backward propagation, something went wrong with grads. Do you have any idea about this problem?

QhelDIV commented 3 weeks ago

It seems this exception does not occur at the begining. (Since it occurs at epoch 48) I never encounter this problem before. How about disabling the visualization callback and run again? As the exception bumps up right after the visualization, you may want to debug from this aspect.