VinAIResearch / BERTweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
MIT License
574 stars 52 forks source link

Host the model on Huggingface? #5

Closed zijwang closed 4 years ago

zijwang commented 4 years ago

It would be nice to have the model also hosted on huggingface (https://huggingface.co/models), so people could use it from the huggingface API without manually downloading the model dump.

datquocnguyen commented 4 years ago

We'd do that as soon as we can (all authors are now busy with emnlp & coling conference submissions).

zijwang commented 4 years ago

Thanks, @datquocnguyen ! Hope everything went well with the submissions. It would also be nice to have the tokenizer built using Huggingface API to make the whole pipeline simpler (without fairseq).

datquocnguyen commented 4 years ago

I am waiting for tranformers' developers to merge my pull request. In the meantime, you can use the BERTweet from this folk: https://github.com/datquocnguyen/transformers

git clone https://github.com/datquocnguyen/transformers
cd transformers
pip install .

Example is available at: https://github.com/datquocnguyen/transformers/tree/master/model_cards/vinai/bertweet-base

bertweet = BertweetModel.from_pretrained("vinai/bertweet-base")
tokenizer = BertweetTokenizer.from_pretrained("vinai/bertweet-base")