cdpierse / transformers-interpret

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Apache License 2.0
1.27k stars 96 forks source link

Issue using BertTokenizer (AttributeError) #119

Open Khazmadoz opened 1 year ago

Khazmadoz commented 1 year ago

Hey there,

I'm trying to use your transofmers-interpret package, while using a bert-base-uncased model with the corresponding tokenizer. I'm loading the tokenizer using: tokenizer = BertTokenizer.from_pretrained( BERT_USED_MODEL_PATH, do_lower_case=True )

When I use the interpreter, using: from transformers_interpret import SequenceClassificationExplainer cls_explainer = SequenceClassificationExplainer(model=model, tokenizer=tokenizer)

I get an error: Traceback (most recent call last): File "/home/khazmadoz/ML_GG/main.py", line 232, in <module> cls_explainer = SequenceClassificationExplainer(model=model, File "/home/khazmadoz/anaconda3/lib/python3.9/site-packages/transformers_interpret/explainers/text/sequence_classification.py", line 53, in __init__ super().__init__(model, tokenizer) File "/home/khazmadoz/anaconda3/lib/python3.9/site-packages/transformers_interpret/explainer.py", line 22, in __init__ self.ref_token_id = self.tokenizer.pad_token_id AttributeError: 'BertTokenizer' object has no attribute 'pad_token_id'

Do you know how to fix this? The tokenizer has an attribute 'pad_token' but no 'pad_token_id'.

Thank you, David

cdpierse commented 1 year ago

Hi @Khazmadoz, is this a custom tokenizer? It seems odd that it would have a pad_token but not a pad_token_id which is just its numerical form, would you be able to run the code below and paste your output:

tokenizer = AutoTokenizer.from_pretrained(PATH_TO_YOUR_TOKENIZER)
print(tokenizer.all_special_tokens)
print(tokenizer.all_special_ids)
Khazmadoz commented 1 year ago

Hey @cdpierse,

the script I use was one of my colleagues who got it from another guy and I came to the conclusion that it was just not nicely coded. I adapted another example of twitter analysis using a multiclass model using an AutoTokenizer. Now it works nicely 😄 Thank you, for your help anyway. However, I wondered if there was a solution to display all the words in a dataset the fine-tuned model used for a decision in favor of a chosen class (I think in lime it’s kind of a tree). I don’t know, if you already implemented something like that, I will deep dive into the documentation to see, if there is something. Else, it would be a very nice feature to add.😊