AI4Bharat / IndicBERT

Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME
https://ai4bharat.iitm.ac.in/language-understanding
MIT License
73 stars 13 forks source link

How to read the Hindi text dataset shared in the Repo #10

Open SandyPanda-MLDL opened 4 months ago

SandyPanda-MLDL commented 4 months ago

I was trying to read the content of the Hindi Text data shared in the repo. However, my system hangs once I try to read the complete data. Can anyone suggest any other alternative (as for example, splitting the data into smaller chunks and then read it).