AI4Bharat / Indic-BERT-v1

Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.com/AI4Bharat/IndicBERT
https://indicnlp.ai4bharat.org
MIT License
276 stars 41 forks source link

Documentation to implement NER #10

Open koushikram3420 opened 3 years ago

koushikram3420 commented 3 years ago

Hey, I tried using IndicBert NER for news article clustering using transformers. While tokenization, some of the tokens are getting split up. I wanted to know if there is any way to avoid it. Also, when I implemented the same example as you have mentioned in your documentation, I get different results. brisbane (2) chanakya (2) kindly help me on why the tokens are not getting recognized properly. When I tried giving custom inputs in the same format of the tokenizer, tokens are not getting recognized and giving encoding as 1 even with add_special_token. ss4 It would be helpful if you could share some implementations of the NER.

Kritz23 commented 3 years ago

Can you please share your notebook? Thanks in advance.

yashsinglatimes commented 2 years ago

Anybody able to create an example of NER for indian language using indic bert. That would be very helpful . @koushikram3420 which model you have usen because I think if you have use indic bert then according to your process its label size should be 768 whereas in yours case label size is 9 Screenshot from 2022-02-03 22-31-55