Closed nv78 closed 1 year ago
So, you mean the workflow of using a dictionary and training a model directly from that, in terms of easiness?
@nv78 could you clarify a bit more?
Yes, there are a few things. 1) Having the workflow of a dictionary, similar to classy_classification:
import spacy import classy_classification
data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }
nlp.add_pipe( "text_categorizer", config={ "data": data, "model": "spacy", "include_sent": True } )
It would be great if "setfit" was a model you could put in here, just so it is easier to run.
The second point is the handling of cases where you have labels from certain categories but not others. For instance, if we are trying to predict "spam" vs "not spam", and we have 4 labels of spam, and 0 labels of not spam, how we could use setfit to handle that situation!
Hi @nv78 I think I added support for outlier detection here.
W.r.t. the addition of SetFit, I do understand your case, but for me, it does not make sense to add that support for this package, because I feel it is a rather separate spacy-setfit
integration.
Feel free to share on your socials 😉
Our team is a huge fan of the recent few shot learning work for text classification involving Setfit. However, running the setfit model via the github link: https://github.com/huggingface/setfit is not as simple and easy to use as classy_classification. Because of this, we were wondering if it would be possible to have the setfit model embedded as one of the classy_classification few shot models for text classification, that way it would be easier to use. We recognize that setfit can take minutes to train without GPU access, and we are all for distilled versions of setfit. Because setfit is one of the more robust few shot learning models today, we think that adding this functionality to classy_classification would be a major plus, that way the models would perform better.