Open shkarupa-alex opened 7 months ago
I found an issue here https://github.com/google-research/big_vision/blob/main/big_vision/pp/ops_text.py#L165 When lowering UTF-8 non-latin text encoding ='utf-8' should be used as mentioned here https://www.tensorflow.org/api_docs/python/tf/strings/lower .
encoding ='utf-8'
This at least can influence at i18n model. But due to models already trained, i'm not sure if this issue should be fixed.
I found an issue here https://github.com/google-research/big_vision/blob/main/big_vision/pp/ops_text.py#L165 When lowering UTF-8 non-latin text
encoding ='utf-8'
should be used as mentioned here https://www.tensorflow.org/api_docs/python/tf/strings/lower .This at least can influence at i18n model. But due to models already trained, i'm not sure if this issue should be fixed.