Hi!
Cool project, I was looking for some language identification for romanized indian languages. The performance doesn't seem to be as high as for other languages (i.e. doesn't approach 99%).
This must surely be because of the data used.
One idea would be to try gpt-4 for translating between non-romanized and romanized text. Have you considered this? Sure it's expensive, but you may ask for free credits from OpenAI for such a good cause. Of course you'd first need to see if this works reasonably well.
Hi! Cool project, I was looking for some language identification for romanized indian languages. The performance doesn't seem to be as high as for other languages (i.e. doesn't approach 99%). This must surely be because of the data used. One idea would be to try gpt-4 for translating between non-romanized and romanized text. Have you considered this? Sure it's expensive, but you may ask for free credits from OpenAI for such a good cause. Of course you'd first need to see if this works reasonably well.