DeNederlandscheBank / nqm

A Transformer-based Machine for answering questions on insurance companies
MIT License
0 stars 0 forks source link

Use pre-trained model #26

Closed jm-glowienke closed 3 years ago

jm-glowienke commented 3 years ago

Use some kind of BERT model available through fairseq. These are made for language modelling and hence can be used as encoder for the model.

Possible Challenges:

Resources: https://github.com/pytorch/fairseq/tree/master/examples/xlmr https://github.com/pytorch/fairseq/tree/master/examples/cross_lingual_language_model https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md

Tasks:

jm-glowienke commented 3 years ago
jm-glowienke commented 3 years ago

RESOLVED

XLM-R model does not work directly: https://github.com/pytorch/fairseq/issues/1842 https://github.com/pytorch/fairseq/tree/47fd985269e92735826c05d9160d68dc8e8a9807/examples/cross_lingual_language_model https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md

Reason: state_dict of XLMR checkpoint, contains more information than the state_dict of the created model. This causes the assertion error

jm-glowienke commented 3 years ago
jm-glowienke commented 3 years ago

Model run failed

NEW TASKS:

Fixed by using --model-overrides and skipping adapting the state_dict to pretrained_model

jm-glowienke commented 3 years ago

Resources: https://github.com/pytorch/fairseq/blob/master/fairseq/options.py#L468-L475 https://github.com/pytorch/fairseq/issues/3600 https://github.com/huggingface/transformers/pull/12082 https://huggingface.co/transformers/model_doc/mbart.html https://github.com/pytorch/fairseq/commit/54423d3b22a3e7f536e02e9e5445cef9becbd60d https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md

jm-glowienke commented 3 years ago

Training is really slow and run into memory issues for mBART

https://tmramalho.github.io/science/2020/06/10/fine-tune-neural-translation-models-with-mBART/

jm-glowienke commented 3 years ago

RESOLVED Similar issue with pre-trained model described here https://github.com/pytorch/fairseq/issues/3530

This was helpful: https://github.com/pytorch/fairseq/blob/master/examples/stories/README.md

jm-glowienke commented 3 years ago

Check why mask present in translations, then model training run --> Difficult, it is added to the source dictionary somewhere in task.setup_taks, but really hard to trace due to the use of override methods

jm-glowienke commented 3 years ago

30 epochs of Training result in no BLEU score of roughly 8. mask still present in output and output is not good