AI4Bharat / IndicBERT

Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME
https://ai4bharat.iitm.ac.in/language-understanding
MIT License
73 stars 13 forks source link

Does Indicbert use Phoneme level Tokenization like Phoneme Level (PL)-Bert? #9

Open SandyPanda-MLDL opened 5 months ago

SandyPanda-MLDL commented 5 months ago

I am using PL-bert and wanted to replace the Transformer-XL tokenizer with the Indicbert tokenizer. But, I was curious to know whether it can be used that way. Whether, Indicbert does phone level tokenization.