Babelscape / rebel

REBEL is a seq2seq model that simplifies Relation Extraction (EMNLP 2021).
505 stars 73 forks source link

Replicating REBEL from BART and some issues #59

Closed jefflink closed 1 year ago

jefflink commented 1 year ago

Hi, thank you for the very interesting work that you have done! I'm trying to replicate the your training process based on the train.py and the _defaultmodel configuration, to reach the state of your REBEL model. However I ran into some issues and would like to seek your help.

/python3.7/site-packages/pytorch_lightning/plugins/native_amp.py:65: FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.
  torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=grad_clip_val, norm_type=norm_type)
Epoch 8:  95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏     | 752/790 [02:51<00:08,  4.39it/s, loss=0.316, v_num=dgll]
processed 300 sentences with 3515 relations; found: 987 relations; correct 520.
Epoch 8, global step 845: val_F1_micro reached 23.17290 (best 23.17290)

Epoch 9:  95%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████      | 752/790 [02:51<00:08,  4.38it/s, loss=1.71, v_num=dgll]
processed 300 sentences with 3515 relations; found: 5986 relations; correct 1.
Epoch 9, step 939: val_F1_micro was not in top 3

Epoch 10:  95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏     | 752/790 [02:50<00:08,  4.40it/s, loss=1.44, v_num=dgll]
processed 300 sentences with 3515 relations; found: 0 relations; correct: 0.
Epoch 10, step 1033: val_F1_micro was not in top 3

Epoch 11:  95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏     | 752/790 [02:49<00:08,  4.44it/s, loss=1.31, v_num=dgll]
processed 300 sentences with 3515 relations; found: 1800 relations; correct 0.
Epoch 11, step 1127: val_F1_micro was not in top 3

Thank you!

jefflink commented 1 year ago

Managed to resolved the first 2 errors by downgrading to torch 1.8.1.

LittlePea13 commented 1 year ago

Sorry for the late reply. The code is indeed a bit "outdated" since both torch, Pytorch Lightning and transformers have seen several updates that may break the code. Hopefully it wasn't too much of an issue. I may try to find some time to update everything, but no promises.

Regarding the sequence to train, there isn't much to it. Just training BART with the REBEL dataset should do it. Depending on your hardware that may take a while, since the model is big and there's a lot of data instances. Make sure to use the default config files, such as default_data.yaml. This will ensure the model only trains on the 230 most frequent relations, which is how REBEL was trained.

Best, Pere-Lluis