davidberenstein1957 / classy-classification

This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.
MIT License
211 stars 15 forks source link

Standalone usage without spaCy setting embeddings post adding the data makes the classifications run twice #40

Closed Avneets closed 1 year ago

Avneets commented 1 year ago

To initiate a ClassyClassification model without sPacy we have to first pass the data and then add in extra settings like embeddings and modifications in SVC config. This makes the model run twice. Can we modify this and make the model train only once after the settings have been added.

from classy_classification import ClassyClassifier

data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] } classifier = ClassyClassifier(data= data) print(classifier("I am looking for kitchen appliances.")) classifier.set_embedding_model(model="all-mpnet-base-v2") classifier.set_classification_model( config={ "C": [1, 2, 5, 10, 20, 100], "kernel": ["sigmoid"], "max_cross_validation_folds": 5 } ) print(classifier("I am looking for kitchen appliances.")) classifier.set_training_data(data=data) print(classifier("I am looking for kitchen appliances."))

we get three different classification scores.

{'furniture': 0.13484464066590968, 'kitchen': 0.8651553593340902} {'furniture': 0.8069939934544372, 'kitchen': 0.19300600654556258} {'furniture': 0.542059833290298, 'kitchen': 0.457940166709702}

Avneets commented 1 year ago

i just found the way. My bad. classifier = ClassyClassifier(data= data,model="all-mpnet-base-v2",config={"C": [1, 2, 5, 10, 20, 100],"kernel": ["sigmoid"],"max_cross_validation_folds": 5})