Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.39k
stars
248
forks
source link
Any experience to get the vocabulary of a specific language (e.g., Chinese)? #380
Hi, I saw you shared a Chinese-version OFA that uses a vocabulary of roughly 20K WordPieces.
I wonder how you get this vocabulary.
Did you extract it from the existing vocab of a pre-trained language model (e.g., from
bert-base-multilingual-cased
) ?Or, did you just build and obtain it on a large-scale Chinese corpus?
Looking forward to your reply.