ThilinaRajapakse / simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
https://simpletransformers.ai/
Apache License 2.0
4.12k stars 727 forks source link

SimpleTransformers Multi-label Text Classification Model loaded using HuggingFace Pipeline giving wrong output #1489

Open DrRaja opened 1 year ago

DrRaja commented 1 year ago

Hi, I trained a Multi-Label Text Classification Model with distilbert-base-uncased. The model works as intended when I use it through the SimpleTransfomers wrapper. However, when I upload the same model to the HuggingFace Hub to consume it using the HuggingFace Pipeline, it only returns a single label and that is also completely incorrect. For example; a sentence like "I love you, I like you" returns the label "toxic" with a score of 0.7, whereas using my local model this is 0. Any idea what am I doing wrong?

The code to train the model:

from simpletransformers.classification import MultiLabelClassificationModel

model = MultiLabelClassificationModel('distilbert', 'distilbert-base-uncased', num_labels=6, args={'train_batch_size':2, 'gradient_accumulation_steps':16, 'learning_rate': 3e-5, 'num_train_epochs': 3, 'max_seq_length': 512})

and the code I'm using to generate the output using HuggingFace:

from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline, pipeline

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained(model_path)

classifier =  pipeline("text-classification", model=model, tokenizer=tokenizer)

classifier('i love you, i like you')

#output [{'label': 'toxic', 'score': 0.708755612373352}]
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.