Closed manueltonneau closed 3 years ago
This hasn't been released on HF yet, right?
Once it is released, it probably won't require too many changes.
For example, with ClassificationModel
:
The model, config, and tokenizer should be added here.
We can probably just check the tokenizer type and set the normalization if it's a BertweetTokenizer
here.
I think it was.
I tried using it with ClassificationModel but it gave me issues when using model_name = bert
, probably because it needs to be using its own BertweetModel
and BertweetTokenizer
.
Awesome, thanks for the links, will look into adding this and open a PR!
Looks like BertweetModel
is not added yet.
Sure, that'll be great!
Right, seems like it's WIP. Will look into it when the PR on transformers is merged. Thanks for your swift reply!
Hi @ThilinaRajapakse is BERTweet model accessible now via simpletransformers library. The model is added to the list of pre-trained models on Huggingface: https://huggingface.co/vinai/bertweet-base.
Please let me know.
I gave it a quick test but it seems to be using a different tokenizer than the default BERT one, so that's causing some issues.
The issue is that the tokenizer is creating None
values in the input features, if anyone wants to investigate.
Hi all!
So I looked into it and got the same issue as the one you mention @ThilinaRajapakse when trying to fine-tune the model on binary classification (using bert
as model_name):
TypeError Traceback (most recent call last)
Just created a PR :)
A BERT-based mode called BERTweet, further pre-trained on English tweets with the RoBERTa pre-training procedure, was recently made available on Hugging Face .
I would love to be able to use it in my simple transformers pipeline. Note that it includes a normalization argument for the tokenizer, which differs from normal tokenization.
If you point me to the relevant py files I need to modify to add it myself, happy to open a PR :)