csebuetnlp / banglabert

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
232 stars 31 forks source link

Vocab File #7

Closed King-Rafat closed 1 year ago

King-Rafat commented 1 year ago

Can you provide the vocab file used for your tokenizers? Thanks.

Tahmid04 commented 1 year ago

https://huggingface.co/csebuetnlp/banglabert/blob/main/vocab.txt