csebuetnlp / banglabert

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
231 stars 31 forks source link

Training Corpus release? #5

Closed imr555 closed 1 year ago

imr555 commented 2 years ago

Will the training corpus be released?

Thanks in advance.

Tahmid04 commented 1 year ago

Hi, the pretraining corpus is now available upon request! Please see here.

imr555 commented 1 year ago

Hi, the pretraining corpus is now available upon request! Please see here.

Thank You Very Much for the detailed reply and the link to the dataset request procedure.