LuoweiZhou / VLP

Vision-Language Pre-training for Image Captioning and Question Answering
Apache License 2.0
411 stars 62 forks source link

adding specific tokens to vocabulary #18

Closed xinyuwang1126 closed 4 years ago

xinyuwang1126 commented 4 years ago

Hi Luowei,

Thanks for sharing this repo! I am trying to adapt it to a specific task. In that task, I wish to remain some tokens unsplit (thousands of tokens). Is there a way that I could do that? I am trying to add tokens to bert vocabulary file but didn't find the file. Thanks and look forward to your reply!

LuoweiZhou commented 4 years ago

@xinyuwang1126 You can change the vocab by replacing the default file with your customized vocab file. Then, you will need to modify the model config file and checkpoint (including both the .bin file and code) as well to map the old embeddings to your new vocab.

xinyuwang1126 commented 4 years ago

got it, thank you!