Thanks for sharing this repo! I am trying to adapt it to a specific task. In that task, I wish to remain some tokens unsplit (thousands of tokens). Is there a way that I could do that? I am trying to add tokens to bert vocabulary file but didn't find the file. Thanks and look forward to your reply!
@xinyuwang1126 You can change the vocab by replacing the default file with your customized vocab file. Then, you will need to modify the model config file and checkpoint (including both the .bin file and code) as well to map the old embeddings to your new vocab.
Hi Luowei,
Thanks for sharing this repo! I am trying to adapt it to a specific task. In that task, I wish to remain some tokens unsplit (thousands of tokens). Is there a way that I could do that? I am trying to add tokens to bert vocabulary file but didn't find the file. Thanks and look forward to your reply!