Differemce between transformer library and tokanization.py

WadeYin9712 / SentiBERT

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

76 stars 15 forks source link

Differemce between transformer library and tokanization.py #6

Closed DeepakDhanasekar closed 4 years ago

DeepakDhanasekar commented 4 years ago

Hey,

I looked into the preprocessing python file and you have used a user defined pytorch_pretrained_bert.tokenization instead of using the huggingface library transformer.

Will i get the same result for sentiBERT if I use the below code:

from transformers import BertTokenizer

If not, could you tell me what is the difference between them?

Thanks!

WadeYin9712 commented 4 years ago

Hi, I'm not sure if there's a difference or not, but it may work as well since I think the tokenization results won't differ a lot. You can try to use 'from transformers import BertTokenizer' ;-). Thanks!

DeepakDhanasekar commented 4 years ago

Thank you for the input! I have another query: I am focusing on the twitter data and have generated the SST training epochs. Does the SentiBERT transferability occur during fine tuning?

WadeYin9712 commented 4 years ago

I think it's worth trying it ;) We used to test on Twitter, but we saw that the transferability may not be much on Twitter dataset. But you can still give it a try on Twitter dataset.