Closed l0renor closed 1 year ago
I debugged so far:
in the pl saving.py file two dictionarys are merged
checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs)
Here the error occurs in the _utils.py file of omegaconf in line 365:
value = _get_value(value)
if value == "???": #Here the error gets thrown because of the comparison
return ret(ValueKind.MANDATORY_MISSING)```
Can someone tell me how to fix this so i can save the model?
If I remember correctly such errors have to do with the Pytorch Lightning version. Could you share your installed version?
Hi I am using 1.1.7. Thanks for the quick reply. What version would you recommend?
That's the recommended version. I am not sure what could be wrong. Can you try directly calling load_from_checkpoint from the module:
model = BasePLModule.load_from_checkpoint(checkpoint_path = 'path_to_checkpoint.ckpt')
I tried different approaches but didn't get it to work. I am event more confused than before.
#module = BasePLModule(conf, config, tokenizer, model)
module = pl.LightningModule()
model = module.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt')
(base) (rebelenv) PS C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\src> python .\model_saving.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File ".\model_saving.py", line 35, in
this Version used to give me the error from my initial comment. Now config, tokenizer, mode don't seem to be passed through to pytorch_lightning\core\saving.py", line 199, in _load_model_state
because of this the error when updating the dict doesn't occour.
module = BasePLModule(conf, config, tokenizer, model)
#module = pl.LightningModule()
model = module.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt')
model.model.save_pretrained('rebel_doc_red')
model.tokenizer.save_pretrained('rebel_doc_red')
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\src\model_saving.py", line 36, in
And which error do you get by trying the first suggestion:
model = BasePLModule.load_from_checkpoint(checkpoint_path = 'path_to_checkpoint.ckpt')
Do not initialize the BasePLModule.
model = BasePLModule.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt',config = config, tokenizer = tokenizer, model = model)
i get the initial error
omegaconf.errors.ConfigKeyError: 'str' object has no attribute '__dict__'
full_key: config
reference_type=Optional[Dict[Union[str, Enum], Any]]
object_type=dict
from the update of the dict
,config = config, tokenizer = tokenizer, model = model)
model = BasePLModule.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt')
i get
File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\rebelenv\lib\site-packages\pytorch_lightning\core\saving.py", line 199, in _load_model_state model = cls(**_cls_kwargs) TypeError: __init__() missing 3 required positional arguments: 'config', 'tokenizer', and 'model'
Hi,
i made an workaround by adding
model.model.save_pretrained('')
at the end of train.py
I am glad you could solve it for your use case. I am sorry I could not provide an answer on how to use the PL checkpoint to do so, whenever I get some time I'll look into it and try to replicate your issue, see if I can find the problem.
Hi, i made an workaround by adding
model.model.save_pretrained('')
at the end of train.py
Hi, I have the exact same issue you had with the saving. Did you train the model from scratch when you added model.model.save_pretrained('') so that the new checkpoint could be loaded? Or did you manage to salvage the "original" checkpoint?
Hi,
I loaded the rebel large model, trained it on a dateset and saved the fin-tuned model.
trainer.fit(pl_module, datamodule=pl_data_module)
model.save_pretrained("rebel_nyt_cti")
print("model_saved")```
Hi, I used your train.py script to train rebel on the docred dataset. When I try to save my model using model_saving.py to use it in transformers I get the following error:
Traceback (most recent call last): File "model_saving.py", line 27, in <module> model = pl_module.load_from_checkpoint(checkpoint_path = 'outputs/2022-09-02/07-42-36/experiments/docred/last.ckpt', config = config, tokenizer = tokenizer, model = model) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/_collections_abc.py", line 832, in update self[key] = other[key] File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 258, in __setitem__ self._format_and_raise( File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise format_and_raise( File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/_utils.py", line 694, in format_and_raise _raise(ex, cause) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise raise ex # set end OC_CAUSE=1 for full backtrace omegaconf.errors.ConfigKeyError: 'str' object has no attribute '__dict__' full_key: config reference_type=Optional[Dict[Union[str, Enum], Any]] object_type=dict
Can I configure it manually? The conf file in conf = omegaconf.OmegaConf.load gets read correctly and I could transfer the values manually.