Ext exp - Githubissues

bigscience-workshop / multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language

https://arxiv.org/abs/2212.09535

Apache License 2.0

69 stars 15 forks source link

Ext exp #11

Closed vnikouliNLE closed 2 years ago

vnikouliNLE commented 2 years ago

@yongzx I've commited by current version of the code. Unfortunately I didn't have time to make sure that everything works as expected, but I wanted to commit before leaving on vacations. The points I am unsure about are:

I am not sure that early stopping really works
I am not sure that the embedding layer is properly saved when the model stops training and that we are able to reuse thus saved model for follow up evaluations

yongzx commented 2 years ago

Hi Vassilina, can you include your commands for calling the XNLI script adapters_xnli_de_vn.py on cross-lingual and supervised-finetuning settings? Sorry for being late to get to this. I just want to make sure that we are on the same page on how we call the directories and on how we treat certain variables (especially original_model and pretrained_model).

yongzx commented 2 years ago

@yongzx line 600 saves embedding layer as embedding.pt and 601 saves the positional embedding in the same folder.

# save embedding and positional embedding (which is not saved by trainer)
trainer.model.save_embeddings(trainer.args.output_dir, 'lng_emb')
torch.save(trainer.model.transformer.wpe, f'{trainer.args.output_dir}/positional_embedding.pt')

Approved.

vnikouliNLE commented 2 years ago

Hi Vassilina, can you include your commands for calling the XNLI script adapters_xnli_de_vn.py on cross-lingual and supervised-finetuning settings? Sorry for being late to get to this. I just want to make sure that we are on the same page on how we call the directories and on how we treat certain variables (especially original_model and pretrained_model).

Added zero-shot commands here: https://github.com/bigscience-workshop/multilingual-modeling/pull/11/commits/f3a165e19760286ffba0e875a708b293c62b8a59

I've splitted the training of XNLI for English (since we only need to do it once and can use for evaluation of different adapted model) from the zero-shot evaluation

yongzx commented 2 years ago

Thanks Vassilina! It seems like we are on the same page for the pretrained_model and original_model. I will perform the following renaming conventions:

zero_shot to cross_lingual
original_model (unchanged)
pretrained_model to adapted_model

yongzx commented 2 years ago

Commenting for future notes: In the tokenized4clm_sampled.py file, we avoid using replace_with_overlap because replacing the overlapped embeddings happen at training part. Vassilina originally added it as a backup solution while the "optimal" replacing wasn't working.