Reason for using a complex model.

AI4Bharat / IndicXlit

Transliteration models for 21 Indic languages

MIT License

67 stars 16 forks source link

Hi maintainers, Just wanted to know the reason for using a complex transformers with 6 layer deep encoder and decoder. The way I see it is the transliteration depends on the current letter and one letter before and after the current letter. So won't using a simpler architecture which focuses just on the current letter the previous letter and next letter make the architecture simpler and easier to train without giving too much on the performance. Quite clearly I don't see the reason of focusing on the first letter while transliterating the last letter. The reason I am asking this question is because this is how humans generally do transliteration (at least from hindi to english) and I am planning to explore a similar architecture. Can you suggest drawbacks of this model architecture?

(Moved your issue from the indicTranslate repo, which is for translation, not transliteration)

What you are saying is more like an character-level n-grams model. It is shown in literature that when one has significant amount of data, deep-learning models easily outperform such simple models. See this Google's paper for example, which showed that transformer-based models performed best for most languages.

Since the main motive behind the current IndicXlit was to build a multilingual model which can serve as a good baseline for the Aksharantar dataset that we released, we chose the vanilla transformer model. However, we encourage researchers to try training multilingual models which are lower in params and faster in inference. (Exploring this was out of scope of our paper).

AI4Bharat / IndicXlit

Reason for using a complex model. #17