bigscience-workshop / multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language
https://arxiv.org/abs/2212.09535
Apache License 2.0
69 stars 15 forks source link

Ext exp #11

Closed vnikouliNLE closed 2 years ago

vnikouliNLE commented 2 years ago

@yongzx I've commited by current version of the code. Unfortunately I didn't have time to make sure that everything works as expected, but I wanted to commit before leaving on vacations. The points I am unsure about are:

yongzx commented 2 years ago

Hi Vassilina, can you include your commands for calling the XNLI script adapters_xnli_de_vn.py on cross-lingual and supervised-finetuning settings? Sorry for being late to get to this. I just want to make sure that we are on the same page on how we call the directories and on how we treat certain variables (especially original_model and pretrained_model).

yongzx commented 2 years ago

@yongzx line 600 saves embedding layer as embedding.pt and 601 saves the positional embedding in the same folder.

# save embedding and positional embedding (which is not saved by trainer)
trainer.model.save_embeddings(trainer.args.output_dir, 'lng_emb')
torch.save(trainer.model.transformer.wpe, f'{trainer.args.output_dir}/positional_embedding.pt')

Approved.

vnikouliNLE commented 2 years ago

Hi Vassilina, can you include your commands for calling the XNLI script adapters_xnli_de_vn.py on cross-lingual and supervised-finetuning settings? Sorry for being late to get to this. I just want to make sure that we are on the same page on how we call the directories and on how we treat certain variables (especially original_model and pretrained_model).

Added zero-shot commands here: https://github.com/bigscience-workshop/multilingual-modeling/pull/11/commits/f3a165e19760286ffba0e875a708b293c62b8a59

I've splitted the training of XNLI for English (since we only need to do it once and can use for evaluation of different adapted model) from the zero-shot evaluation

yongzx commented 2 years ago

Thanks Vassilina! It seems like we are on the same page for the pretrained_model and original_model. I will perform the following renaming conventions:

yongzx commented 2 years ago

Commenting for future notes: In the tokenized4clm_sampled.py file, we avoid using replace_with_overlap because replacing the overlapped embeddings happen at training part. Vassilina originally added it as a backup solution while the "optimal" replacing wasn't working.