google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.04k stars 140 forks source link

Text lowering issue #79

Open shkarupa-alex opened 7 months ago

shkarupa-alex commented 7 months ago

I found an issue here https://github.com/google-research/big_vision/blob/main/big_vision/pp/ops_text.py#L165 When lowering UTF-8 non-latin text encoding ='utf-8' should be used as mentioned here https://www.tensorflow.org/api_docs/python/tf/strings/lower .

This at least can influence at i18n model. But due to models already trained, i'm not sure if this issue should be fixed.