Encoding error reading text_tokenizer.json

Alpha-VLLM / Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

https://arxiv.org/abs/2408.02657

493 stars 20 forks source link

Encoding error reading text_tokenizer.json #2

Closed throttlekitty closed 2 months ago

throttlekitty commented 3 months ago

On launch, I got the following error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3565933: character maps to

For a quick fix, in item_processor.py, line 89, I made this change, but I don't know if it's an issue for other OS however. json.load(open("./ckpts/chameleon/tokenizer/text_tokenizer.json", encoding="utf8"))["model"]["vocab"]

ChrisLiu6 commented 3 months ago

Thank you for the feedback, we will fix it

ChrisLiu6 commented 2 months ago

https://github.com/Alpha-VLLM/Lumina-mGPT/commit/87811a3de3a33cbbb118c4cd2d5cf7143acf8e1b