XNLI evaluation (for baseline)

bigscience-workshop / multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language

https://arxiv.org/abs/2212.09535

Apache License 2.0

69 stars 15 forks source link

XNLI evaluation (for baseline) #14

Closed yongzx closed 2 years ago

yongzx commented 2 years ago

I have made the following changes:

repackaged codes in load_model function into load_task_specific_adapters, load_embedding_layers, and load_language_adapters.
removed do_eval_after_train argument. Simply use do_predict will do.
added baseline argument for simply fine-tuning the BLOOM model without language adapters and replacing embedding layers. May be redundant and to be removed in the future.

yongzx commented 2 years ago

@vnikouliNLE Can you help review XNLI evaluation? I am running into the issue where evaluating right after the training (i.e., have do_train and do_predict in a single script at the same time) gives wildly different results than evaluating the check-points only (first run do_train then run do_predict.)