Issue with train with mt_template.json

sadanyh commented 2 years ago

Thank you for the demo notebook. I have trained my MLM+TLM model but I get this error with training with the mt_template.json stage:

Traceback (most recent call last): File "train.py", line 816, in main(params) File "train.py", line 554, in main end_of_epoch(trainer = trainer, evaluator = evaluator, params = params, logger = logger) File "train.py", line 441, in end_of_epoch trainer.end_epoch(scores) File "/content/meta_XLM/XLM/src/trainer.py", line 736, in end_epoch assert metric in scores, metric AssertionError: valid_en-fr_mt_bleu

Could you please help with why this error is happening? I am not training on en or fr?

Tikquuss commented 2 years ago

It seems that you haven't changed the following two parameters.

"stopping_criterion":"valid_en-fr_mt_bleu,10", 
"validation_metrics":"valid_en-fr_mt_bleu",

In your case, if you have two abbreviated languages lang1 and lang2, and you want to use the blue metric (there is also accuracy, perplexity and loss) as a stopping criterion and to validate your models, you have to do that.

"stopping_criterion":"valid_lang1-lang2_mt_bleu,10", 
"validation_metrics":"valid_lang1-lang2_mt_bleu",

I should mention that the framework supports multilingual translation, so lang1 and lang2 can be chosen from a larger set of languages.

Tikquuss commented 2 years ago

Is this okay?

sadanyh commented 2 years ago

Yes, that is completely fine. I managed to train with an MLM +TLM objective. I wanted the BLUE score for evaluation as well so I had to change my language initials to match the code.

I have one question on how to use the trained model for translation. The translate.py is the one I should be using right? Does it take a tokenized and PBE text as input?

Thanks so much for your help

Tikquuss commented 2 years ago

No. translate.py is for inference.

To train an automatic machine translation model, always use train.py by specifying the "mt_steps" objective (see /configs/mt_template.json)

For example if you want to translate from English (en) to French (fr), then "lgs": "en-fr" and "mt_steps": "en-fr". Note that the system is multilingual, so it is bidirectional for a pair of languages. So you can simultaneously train a model to translate from en to fr and from fr to en by specifying "lgs": "en-fr" and "mt_steps": "en-fr,fr-en". You can go further by translating several languages simultaneously. Let's add to our previous languages German (de) and Italian (it). Then you can do "lgs": "en-fr-de-it" and "mt_steps":"...". In this case mt_steps (...) will be replaced by all possible combinations of your languages: en-fr,en-de,en-it,fr-en,fr-de,fr-it,de-en, etc (it's long to specify manually when the number of languages increases).

Note that the system is multi-tasking, so you can simultaneously do clm (causal language modeling), mlm (mask language modeling), tml (translation language modeling), ae (denoising auto-encoding), bt (online back-translation) and mt (machine translation).

ae + bt = unsupervised mt

If you need to understand how all this works you can refer (if not already done) to these papers:

(ae) Extracting and Composing Robust Features with Denoising Autoencoders : https://www.cs.toronto.edu/~larocheh/publications/icml-2008-denoising-autoencoders.pdf
(bt) Improving Neural Machine Translation Models with Monolingual Data : https://arxiv.org/abs/1511.06709
(ae, bt, mt : supervised and unsupervised mt) Phrase-Based & Neural Unsupervised Machine Translation : https://arxiv.org/abs/1804.07755
(mlm) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805
- (clm) XLNet: Generalized Autoregressive Pretraining for Language Understanding : https://arxiv.org/abs/1906.08237
(clm) GPT/GPT-2/GPT-3
- Improving Language Understanding by Generative Pre-Training https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
- Language Models are Few-Shot Learners : https://arxiv.org/abs/2005.14165
(mlm, tlm, clm, multi-lingual & cross-lingual mt, both supervised and unsupervised ...) Cross-lingual Language Model Pretraining : https://arxiv.org/abs/1901.07291
(meta-learning) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks : https://arxiv.org/abs/1703.03400
(our paper : all this + metalearning) : On the use of linguistic similarities to improve Neural Machine Translation for African Languages https://openreview.net/forum?id=Q5ZxoD2LqcI (the updated version will be on arxiv soon)

For another project I'm working on, I integrated a new architecture in the code, TIM (transformers with competitive ensembles of independent mechanisms: https://arxiv.org/abs/2103.00336), which can be used in place of the normal transformer. Also, I integrated code to automatically fine-tune models on text classification tasks (GLUE, XNLI, costum task ...). All these updates are here, I will make everything public with another paper.

Tikquuss commented 2 years ago

I'm trying to reproduice all this with huggingface transformer library : https://github.com/Tikquuss/lm

sadanyh commented 2 years ago

Thanks a lot for your help. That is quite thorough.

I already used lm_template.json to train a language model with the parameter "mlm_steps":"...". As per your Github, this by default uses my monolingual and parallel datasets (de, en, de-en). Then I used this language model and trained using mt_template.json with parameter "mt_steps":"...". I believe that now I have an MT model for my languages, right?

Now if I want to use it on new test sets for inference, do I use the translate.py? Could you give a hint on how to use it?

Thank you so much again and I will definitely check your new project and best of luck with your future paper.

sadanyh commented 2 years ago

I am trying to use the translate.py for inference. I get the following error: Traceback (most recent call last): File "translate.py", line 141, in main(params) File "translate.py", line 60, in main logger = initialize_exp(params) File "/user/HS301/m16265/Documents/XML-R/meta_XLM/XLM/src/utils.py", line 57, in initialize_exp device = params.device AttributeError: 'Namespace' object has no attribute 'device'

I am not sure what I am doing wrong. My command on the command line is as follows:

cat /user/HS301/m16265/Documents/XML-R/processed/test.en | python translate.py --exp_name mt_enfrde --model_path /user/HS301/m16265/Documents/XML-R/dump_path/mt_enfrde/demo/best-valid_de-en_mt_bleu.pth --src_lang en --tgt_lang de --output_path output

Can you help with this error please if you have any suggestions for solving it? Thank you

Tikquuss commented 2 years ago

Use translate_our.py instead (see https://github.com/Tikquuss/meta_XLM/blob/master/XLM/translate_our.py#L115 for how to use)

sadanyh commented 2 years ago

Thanks you for your help but I get this error with translate_our.py

Traceback (most recent call last): File "translate_our.py", line 175, in main(params) File "translate_our.py", line 149, in main logger = initialize_exp(params) File "/user/HS301/m16265/Documents/XML-R_server/meta_XLM/XLM/src/utils.py", line 57, in initialize_exp device = params.device AttributeError: 'Namespace' object has no attribute 'device'

I noticed that you have a device variable in the translate_our.py https://github.com/Tikquuss/meta_XLM/blob/master/XLM/translate_our.py#L40

should I be doing something before running the translate_our.py. Thank you for your help.

Tikquuss commented 2 years ago

Go to line 34 of translate_our.py (before logger = initialize_exp(params)) and set up the device, for example you can add the line of code:

params.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

If other parameters are missing just try to do the same. All parameters are well described in train.py (I encourage you to understand the code well to be able to make some adjustments yourself)

Tikquuss / meta_XLM

Issue with train with mt_template.json #5