facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.98k stars 567 forks source link

How to train a new tokenier.json #64

Closed gk966988 closed 1 year ago

gk966988 commented 1 year ago

Now, I want to get a new tokenier, IWhich language processing model did you use to obtain the tokenizer? Can I train the transformer's BERT to obtain a tokenizer that can be used in Nougat?

lukas-blecher commented 1 year ago

You can use any tokenizer in the HF format https://huggingface.co/docs/tokenizers/api/trainers