between zero shot and few shot

I am a huge fan of this Github repo. One thing I noticed was that there is a pretty large jump in performance between zero shot tasks and few shot tasks, when there are fewer than N annotations. For instance, let's say we have 4 classes: cat, dog, mouse and fox, and sentences pertaining to each class in our dataset. Our zero shot model is able to predict with say ~50% accuracy the label of the corresponding sentence. Similarly, when we have ~8 labels for each class (8 cat, 8 dog, 8 mouse, and 8 fox), the few shot model begins to surpass the zero shot model, performing at say ~63% accuracy. However, there is this gap in between zero shot and few shot tasks, when say you just get your first 5 labels, and 3 of them are dog, one is cat, and one is mouse, where the model will perform relatively poorly, and have worse performance than the zero shot task. The reason for this is that the few shot models are very sensitive to user input.

Our team had a few ideas to solve this. One idea was to still take the class_category / label names as input to the few shot model, and have the few shot model be biased to the zero shot model when the number of annotation is small. The other way was to create random prompts / synthetic data for each category when there are just a few annotations, which may work okay but is not best practice. It would be great if there was a mechanism for classy_classification that handle this edge case where there are just a few annotations (where not each class has a specific annotation: say our distribution is 3 dog, 1 cat, 1 mouse, 0 fox).

davidberenstein1957 / classy-classification

between zero shot and few shot #31