Hi i am working on adding a few terms from my domain to the vocab.txt. I am working with the multi language cased pre-trained model 'multi_cased_L-12_H-768_A-12' . However i am unsure of what words should be added. So my question is should the new words added to the [unused X] lines be words not found in the english dictionary and be words unique to my domain, or should i be adding all frequently used words, even if they are common english words, but are not in the vocab.txt file. Additionally should abbreviations be added such as "LOL", and should the lowercase version 'lol', be also added since i am using a cased pre-trained model.
Hi i am working on adding a few terms from my domain to the vocab.txt. I am working with the multi language cased pre-trained model 'multi_cased_L-12_H-768_A-12' . However i am unsure of what words should be added. So my question is should the new words added to the [unused X] lines be words not found in the english dictionary and be words unique to my domain, or should i be adding all frequently used words, even if they are common english words, but are not in the vocab.txt file. Additionally should abbreviations be added such as "LOL", and should the lowercase version 'lol', be also added since i am using a cased pre-trained model.