Babelscape / rebel

REBEL is a seq2seq model that simplifies Relation Extraction (EMNLP 2021).
502 stars 73 forks source link

problem with model_saving.py #47

Closed l0renor closed 1 year ago

l0renor commented 2 years ago

Hi, I used your train.py script to train rebel on the docred dataset. When I try to save my model using model_saving.py to use it in transformers I get the following error: Traceback (most recent call last): File "model_saving.py", line 27, in <module> model = pl_module.load_from_checkpoint(checkpoint_path = 'outputs/2022-09-02/07-42-36/experiments/docred/last.ckpt', config = config, tokenizer = tokenizer, model = model) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/_collections_abc.py", line 832, in update self[key] = other[key] File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 258, in __setitem__ self._format_and_raise( File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/base.py", line 95, in _format_and_raise format_and_raise( File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/_utils.py", line 694, in format_and_raise _raise(ex, cause) File "/anaconda/envs/azureml_py38_PT_TF/lib/python3.8/site-packages/omegaconf/_utils.py", line 610, in _raise raise ex # set end OC_CAUSE=1 for full backtrace omegaconf.errors.ConfigKeyError: 'str' object has no attribute '__dict__' full_key: config reference_type=Optional[Dict[Union[str, Enum], Any]] object_type=dict

Can I configure it manually? The conf file in conf = omegaconf.OmegaConf.load gets read correctly and I could transfer the values manually.

l0renor commented 2 years ago

I debugged so far: in the pl saving.py file two dictionarys are merged checkpoint[cls.CHECKPOINT_HYPER_PARAMS_KEY].update(kwargs) Here the error occurs in the _utils.py file of omegaconf in line 365:


value = _get_value(value)
if value == "???": #Here the error gets thrown because of the comparison
return ret(ValueKind.MANDATORY_MISSING)```

Can someone tell me how to fix this so i can save the model?
LittlePea13 commented 2 years ago

If I remember correctly such errors have to do with the Pytorch Lightning version. Could you share your installed version?

l0renor commented 2 years ago

Hi I am using 1.1.7. Thanks for the quick reply. What version would you recommend?

LittlePea13 commented 2 years ago

That's the recommended version. I am not sure what could be wrong. Can you try directly calling load_from_checkpoint from the module:

model = BasePLModule.load_from_checkpoint(checkpoint_path = 'path_to_checkpoint.ckpt')

l0renor commented 1 year ago

I tried different approaches but didn't get it to work. I am event more confused than before.

Errors in different trys

1

#module =  BasePLModule(conf, config, tokenizer, model)
module = pl.LightningModule()

model = module.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt')

error

(base) (rebelenv) PS C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\src> python .\model_saving.py Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File ".\model_saving.py", line 35, in model = module.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt') File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\rebelenv\lib\site-packages\pytorch_lightning\core\saving.py", line 159, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, kwargs) File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\rebelenv\lib\site-packages\pytorch_lightning\core\saving.py", line 199, in _load_model_state model = cls(_cls_kwargs) File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\rebelenv\lib\site-packages\pytorch_lightning\core\lightning.py", line 73, in init super().init(*args, **kwargs) TypeError: init() got an unexpected keyword argument 'train_batch_size'

2

this Version used to give me the error from my initial comment. Now config, tokenizer, mode don't seem to be passed through to pytorch_lightning\core\saving.py", line 199, in _load_model_state

because of this the error when updating the dict doesn't occour.

module =  BasePLModule(conf, config, tokenizer, model)
#module = pl.LightningModule()

model = module.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt')

model.model.save_pretrained('rebel_doc_red')
model.tokenizer.save_pretrained('rebel_doc_red')

error

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\src\model_saving.py", line 36, in model = module.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt') File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\rebelenv\lib\site-packages\pytorch_lightning\core\saving.py", line 159, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, kwargs) File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\rebelenv\lib\site-packages\pytorch_lightning\core\saving.py", line 199, in _load_model_state model = cls(_cls_kwargs) TypeError: init() missing 3 required positional arguments: 'config', 'tokenizer', and 'model'

LittlePea13 commented 1 year ago

And which error do you get by trying the first suggestion: model = BasePLModule.load_from_checkpoint(checkpoint_path = 'path_to_checkpoint.ckpt')

Do not initialize the BasePLModule.

l0renor commented 1 year ago

When I use

model = BasePLModule.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt',config = config, tokenizer = tokenizer, model = model) i get the initial error

 omegaconf.errors.ConfigKeyError: 'str' object has no attribute '__dict__'
    full_key: config
    reference_type=Optional[Dict[Union[str, Enum], Any]]
    object_type=dict

from the update of the dict

and if I use it like you suggest even without the ,config = config, tokenizer = tokenizer, model = model)

model = BasePLModule.load_from_checkpoint(checkpoint_path = 'model_convert/rebel_fintune_doc_red.ckpt') i get

File "C:\Users\leon.lukas\Documents\X-Next\Code\Rebel_Experiments\rebel-main\rebelenv\lib\site-packages\pytorch_lightning\core\saving.py", line 199, in _load_model_state model = cls(**_cls_kwargs) TypeError: __init__() missing 3 required positional arguments: 'config', 'tokenizer', and 'model'

l0renor commented 1 year ago

Hi, i made an workaround by adding model.model.save_pretrained('') at the end of train.py

LittlePea13 commented 1 year ago

I am glad you could solve it for your use case. I am sorry I could not provide an answer on how to use the PL checkpoint to do so, whenever I get some time I'll look into it and try to replicate your issue, see if I can find the problem.

SaraBtt commented 1 year ago

Hi, i made an workaround by adding model.model.save_pretrained('') at the end of train.py

Hi, I have the exact same issue you had with the saving. Did you train the model from scratch when you added model.model.save_pretrained('') so that the new checkpoint could be loaded? Or did you manage to salvage the "original" checkpoint?

l0renor commented 1 year ago

Hi,

I loaded the rebel large model, trained it on a dateset and saved the fin-tuned model.


trainer.fit(pl_module, datamodule=pl_data_module)
model.save_pretrained("rebel_nyt_cti")
print("model_saved")```