Babelscape / rebel

REBEL is a seq2seq model that simplifies Relation Extraction (EMNLP 2021).
475 stars 71 forks source link

Issue while loading the trained checkpoint #55

Open NikitaGautam opened 1 year ago

NikitaGautam commented 1 year ago

Hi,

I am trying to test the trained model by loading the checkpoint, but it shows the following error: Traceback (most recent call last): File "test.py", line 119, in main train(conf) File "test.py", line 101, in train pl_module = pl_module.load_from_checkpoint(checkpoint_path=conf.checkpoint_path,config=config, tokenizer = tokenizer, model = model) File "/home//virtualenv/luke/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs) File "/usr/lib/python3.8/_collections_abc.py", line 832, in update self[key] = other[key] omegaconf.errors.ConfigKeyError: Key 'config' is not in struct full_key: config reference_type=Optional[Dict[Union[str, Enum], Any]] object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I came across this issue: https://github.com/Babelscape/rebel/issues/47 but the answers did not help. I tried converting the config to struct using Omegaconfig but it still does not work.

LittlePea13 commented 1 year ago

I am sorry about that, I think at some point there was a version incompatibility regarding the use of hydra/omegaconf with the checkpointing. Since you are already loading the module with the parameters, as long as you do not need to update them with the ones in the checkpoint, you can comment out the line

File "/home//virtualenv/luke/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs)

And it should load it without issues. I know it's kind of an ugly hack but it's what I suggest until I find a proper fix if you need to reload your checkpoint.

NikitaGautam commented 1 year ago

Thanks for the quick fix. I tried several different things to reload the checkpoints but was not successful. If I find a solution, I will also post it here. For now, this quick fix works.

Andreas-Moller-Belsager commented 3 months ago

I have a similar issue I have trained my model now, but when I try to run the script you showed on the front page (test.py model=rebel_model data=conll04_data train=conll04_train do_predict=True checkpoint_path="path_to_checkpoint"), it instead wants to load the path to the latest saved item, even if this is an error message. This means I cannot fetch the checkpoint from the path I define in the command.

Specifically, it does the following:

  1. Fetches the path to the latest saved item in 'output'
  2. Concatenates the path to the checkpoint specified in the command.

This means it tries to open a path that does not exist.

Do you know what is wrong

Example of this:

image

(I want to use the model saved at timestamp '2024-05-16/18-27-53', yet it wants to send me to something created at timestamp '2024-05-23/18-38-08')