bigscience-workshop / multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language
https://arxiv.org/abs/2212.09535
Apache License 2.0
69 stars 15 forks source link

Support Embedding Strategy: Extend Vocab #19

Closed yongzx closed 2 years ago

yongzx commented 2 years ago

I have checked @vnikouliNLE's tokenization strategy and confirmed that the indices of added vocab tokens are after those of the original vocab.

I used register hook to update subelements of the embedding weights.

yongzx commented 2 years ago

Resolved by #20