aub-mind / arabert

Pre-trained Transformers for Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic ELECTRA)
https://huggingface.co/aubmindlab
624 stars 138 forks source link

'BertTokenizerFast' object has no attribute 'max_len' #46

Closed wirokemproh closed 3 years ago

wirokemproh commented 3 years ago

I'm tryin to use Arabert for Arabic Text Sentiment by following the example from the old folder, which is: https://github.com/aub-mind/arabert/blob/master/examples/old/AraBERT_Text_Classification_with_HF_Trainer_Pytorch_GPU.ipynb

However, I failed on this line,

train_features = train_dataset.get_features(tokenizer = tokenizer, max_length =128)
test_features = test_dataset.get_features(tokenizer = tokenizer, max_length =128)

The error is:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-47-7d9bca88ea8f> in <module>
----> 1 train_features = train_dataset.get_features(tokenizer = tokenizer, max_length =128)
      2 test_features = test_dataset.get_features(tokenizer = tokenizer, max_length =128)

~/opt/miniconda3/envs/text-sentiment/lib/python3.7/site-packages/transformers/data/processors/utils.py in get_features(self, tokenizer, max_length, pad_on_left, pad_token, mask_padding_with_zero, return_tensors)
    271                 example.text_a,
    272                 add_special_tokens=True,
--> 273                 max_length=min(max_length, tokenizer.max_len),
    274             )
    275             all_input_ids.append(input_ids)

AttributeError: 'BertTokenizerFast' object has no attribute 'max_len'

Any advice how I can make it work?

WissamAntoun commented 3 years ago

which AraBERT version are you using?

WissamAntoun commented 3 years ago

I just uploaded a new updated COLAB file for text-classification that is easy to use https://colab.research.google.com/drive/1P9iQHtUH5KUbTVtp8B4-AopZzEEPE0lw?usp=sharing

wirokemproh commented 3 years ago

Thank you very much. This updated version works for me