What is special in IndicBERT compared to other models?

IndicBERT is a multilingual ALBERT model pretrained exclusively on 12 major Indian languages - Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.

It is pre-trained on our novel monolingual corpus IndicCorp which contains around 9 billion tokens and subsequently evaluated on a set of diverse tasks. IndicBERT has much fewer parameters (refer to the attached picture) than other multilingual models (mBERT, XLM-R etc.) while it also achieves a performance on-par or better than these models.

You can get more details about the tasks and IndicBert model from our paper.

As this is not an GitHub issue wrt the code, I'm closing this.

AI4Bharat / Indic-BERT-v1

What is special in IndicBERT compared to other models? #31