Open Jerry-Master opened 12 months ago
https://github.com/MoyGcc/vid2avatar/issues/7 I guess this might solve your problem.
Did not work, it now gives this error:
Error executing job with overrides: []
Traceback (most recent call last):
File "test.py", line 40, in main
trainer.test(model, testset, ckpt_path=checkpoint)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 907, in test
return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 950, in _test_impl
results = self._run(model, ckpt_path=self.tested_ckpt_path)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1190, in _run
self.checkpoint_connector.restore_training_state()
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 188, in restore_training_state
self.restore_loops()
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 221, in restore_loops
self.trainer.test_loop.load_state_dict(state_dict["test_loop"])
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 252, in load_state_dict
self._load_from_state_dict(state_dict.copy(), prefix, metrics)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 281, in _load_from_state_dict
state_dict[prefix + k], metrics=metric_attributes, sync_fn=self.trainer.training_type_plugin.reduce
KeyError: '_results'
If you want a more detailed traceback I managed to trace the error to
grad = torch.autograd.grad(
outputs=pnts_d,
inputs=pnts_c,
grad_outputs=d_out,
create_graph=create_graph,
retain_graph=True if i < num_dim - 1 else retain_graph,
only_inputs=True)[0]
in v2a.py line 258. This is the problematic line.
Did not work, it now gives this error:
Error executing job with overrides: [] Traceback (most recent call last): File "test.py", line 40, in main trainer.test(model, testset, ckpt_path=checkpoint) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 907, in test return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 950, in _test_impl results = self._run(model, ckpt_path=self.tested_ckpt_path) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1190, in _run self.checkpoint_connector.restore_training_state() File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 188, in restore_training_state self.restore_loops() File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 221, in restore_loops self.trainer.test_loop.load_state_dict(state_dict["test_loop"]) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 252, in load_state_dict self._load_from_state_dict(state_dict.copy(), prefix, metrics) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 281, in _load_from_state_dict state_dict[prefix + k], metrics=metric_attributes, sync_fn=self.trainer.training_type_plugin.reduce KeyError: '_results'
This might be because you trained your model with another version of pytorch-lightning. Maybe you could quickly try training with your own data using the "suggested" pytorch-lightning version (no need for full convergence) and then run test.py.
If you want a more detailed traceback I managed to trace the error to
grad = torch.autograd.grad( outputs=pnts_d, inputs=pnts_c, grad_outputs=d_out, create_graph=create_graph, retain_graph=True if i < num_dim - 1 else retain_graph, only_inputs=True)[0]
in v2a.py line 258. This is the problematic line.
I might need more time to debug this due to the pytorch-lightning version update. But for now, you could try training again to walk around this issue.
When rerunning training I get:
Error executing job with overrides: []
Traceback (most recent call last):
File "train.py", line 39, in main
trainer.fit(model, trainset, validset, ckpt_path=checkpoint)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 739, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 773, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1190, in _run
self.checkpoint_connector.restore_training_state()
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 188, in restore_training_state
self.restore_loops()
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 217, in restore_loops
self.trainer.fit_loop.load_state_dict(state_dict["fit_loop"])
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 255, in load_state_dict
v.load_state_dict(state_dict.copy(), prefix + k + ".")
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 252, in load_state_dict
self._load_from_state_dict(state_dict.copy(), prefix, metrics)
File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 281, in _load_from_state_dict
state_dict[prefix + k], metrics=metric_attributes, sync_fn=self.trainer.training_type_plugin.reduce
KeyError: 'epoch_loop._results'
When rerunning training I get:重新運行訓練時我得到:
Error executing job with overrides: [] Traceback (most recent call last): File "train.py", line 39, in main trainer.fit(model, trainset, validset, ckpt_path=checkpoint) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 739, in fit self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 773, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1190, in _run self.checkpoint_connector.restore_training_state() File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 188, in restore_training_state self.restore_loops() File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 217, in restore_loops self.trainer.fit_loop.load_state_dict(state_dict["fit_loop"]) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 255, in load_state_dict v.load_state_dict(state_dict.copy(), prefix + k + ".") File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 252, in load_state_dict self._load_from_state_dict(state_dict.copy(), prefix, metrics) File "/home/user/anaconda3/envs/vid2avatar/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 281, in _load_from_state_dict state_dict[prefix + k], metrics=metric_attributes, sync_fn=self.trainer.training_type_plugin.reduce KeyError: 'epoch_loop._results'
I have the same problem. When I try to upgrade the pytorch lightning with version 1.6.0, and it works!! Share the result with you.
I have trained a model with my own data, everything worked properly but when I launched the test.py it gave me this error:
How could I fix this? I haven't changed any line of code, and run everything as stated.