Open Taytay opened 7 months ago
Hi @Taytay,
Thanks for your feedback!
It is indeed a cool feature of our method.
I think it can be relevant for cases when new classes need to be introduced only in inference time. It was called out-of-domain (or maybe distribution) OOD in the context of intent detection. So this is a clearer case this can be relevant.
Additionally, yes I agree that our method tech the model to recognize the text and label similarity, like in IR models, and can gain from more general training, and become a zero-shot classifier. Maybe a simple experiment can be to concatenate all FewMany datasets (maybe 5/10-shot) and train and see if it improves its confidence scores.
CC @elronbandel
First, this is great! Thank you for publishing the results and code!
This is my favorite part of the paper:
I love that this approach doesn't require a predetermined classification head!
As a result, would I be right to presume that I could provide new labels at inference time? If those labels bore a resemblance to my training set, I think it would do quite well. If they don't, it would "revert" to determining the most similar label, which should still work, right? That makes this a capable zero-shot classifier as well, right?
Here is my initial experiment with it. I appears to "work", although of course its confidence isn't nearly as high if the new labels don't overlap semantically with the original banking labels. I would presume you could fix this by training a more generalized FastFit model?
Am I understanding this correctly?