Use pre-trained model - Githubissues

jm-glowienke commented 3 years ago

Use some kind of BERT model available through fairseq. These are made for language modelling and hence can be used as encoder for the model.

Possible Challenges:

~~How to adapt dictionary?~~
~~reset training metrics~~
~~Have to adapt the model itself to delete or reset the embeddings~~
[x] Use a large english dictionary for the input side
~~Must use moses as tokenizer~~
[x] Must use the according Subword unit algorithm

Resources: https://github.com/pytorch/fairseq/tree/master/examples/xlmr https://github.com/pytorch/fairseq/tree/master/examples/cross_lingual_language_model https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md

Tasks:

[x] Create new model file based on the pretrained_from_xlm using the standard iwslt transformer decoder --> Merge pre-trained encoder with untrained decoder
Use dictionary from pre-trained model for encoder
Hand-made dictionary for decoder
[x] Check for subwords and moses!!
[x] Create pre-processing pipeline
[x] First model test run
[x] Check whether post processing inside fairseq works, else create own post_processing functionality
[ ] Evaluation of XLMR_iwslt model

jm-glowienke commented 3 years ago

[x] Could have another look at mBART: pre-trained mBART is far too large too handle

jm-glowienke commented 3 years ago

RESOLVED

XLM-R model does not work directly: https://github.com/pytorch/fairseq/issues/1842 https://github.com/pytorch/fairseq/tree/47fd985269e92735826c05d9160d68dc8e8a9807/examples/cross_lingual_language_model https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md

Reason: state_dict of XLMR checkpoint, contains more information than the state_dict of the created model. This causes the assertion error

jm-glowienke commented 3 years ago

[ ] XLMR integration with pointer-generator model

jm-glowienke commented 3 years ago

Model run failed

Process terminated far too quickly
checkpoint file is gigantic (3,6GB) and reliant on original XLMR checkpoint
Maybe write own model code to set it up differently

NEW TASKS:

[x] ~~Experiment with reset-dataloader options~~
[x] ~~Load state dict in a different way, more like mBART did~~
[x] Train longer
[ ] Experiment to only train decoder, not encoder

Fixed by using --model-overrides and skipping adapting the state_dict to pretrained_model

jm-glowienke commented 3 years ago

Resources: https://github.com/pytorch/fairseq/blob/master/fairseq/options.py#L468-L475 https://github.com/pytorch/fairseq/issues/3600 https://github.com/huggingface/transformers/pull/12082 https://huggingface.co/transformers/model_doc/mbart.html https://github.com/pytorch/fairseq/commit/54423d3b22a3e7f536e02e9e5445cef9becbd60d https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md

jm-glowienke commented 3 years ago

Training is really slow and run into memory issues for mBART

https://tmramalho.github.io/science/2020/06/10/fine-tune-neural-translation-models-with-mBART/

jm-glowienke commented 3 years ago

RESOLVED Similar issue with pre-trained model described here https://github.com/pytorch/fairseq/issues/3530

This was helpful: https://github.com/pytorch/fairseq/blob/master/examples/stories/README.md

jm-glowienke commented 3 years ago

Check why mask present in translations, then model training run --> Difficult, it is added to the source dictionary somewhere in task.setup_taks, but really hard to trace due to the use of override methods

jm-glowienke commented 3 years ago

30 epochs of Training result in no BLEU score of roughly 8. mask still present in output and output is not good

DeNederlandscheBank / nqm

Use pre-trained model #26