AI4Bharat / IndicBERT

Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME
https://ai4bharat.iitm.ac.in/language-understanding
MIT License
73 stars 13 forks source link

Tokenizer class IndicBERTSentencePieceTokenizer does not exist or is not currently imported. #15

Open alvynabranches opened 1 month ago

alvynabranches commented 1 month ago

Code

import torch
from transformers import pipeline, AutoModel, AutoTokenizer

model_id = "ai4bharat/IndicBERTv2-alpha-SentimentClassification"
tokenizer = AutoTokenizer.from_pretrained(model_id, keep_accents=True)
model = AutoModel.from_pretrained(model_id).to("cuda")

Error

ValueError                                Traceback (most recent call last)
[<ipython-input-8-faa308d63e88>](https://localhost:8080/#) in <cell line: 2>()
      1 model_id = "ai4bharat/IndicBERTv2-alpha-SentimentClassification"
----> 2 tokenizer = AutoTokenizer.from_pretrained(model_id, keep_accents=True)
      3 model = AutoModel.from_pretrained(model_id).to("cuda")

[/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    892                 tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)
    893             if tokenizer_class is None:
--> 894                 raise ValueError(
    895                     f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
    896                 )

ValueError: Tokenizer class IndicBERTSentencePieceTokenizer does not exist or is not currently imported.