flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.91k stars 2.1k forks source link

Few-shot training for topic modelling with a hope of improvement from zero-shot prediction #3046

Open predoctech opened 1 year ago

predoctech commented 1 year ago

Newbie experimenting with ZS and FS learning but get lost. So wish to get some sense out of the following question. I guess my task is a topic modelling case. I have a few class names which are domain specific (ie legal) topic names and by passing a sentence/paragraph to the TARS model and run a ZS prediction, hope to classify that into one of the class (thus a topic) names. So with base TARS model given its non-specific nature the accuracy wouldn't be very high for a domain specific sentence. That's expected but also if the sentence is completely non-related to the domain, it should map to any class name. That seem to work with the base TARS model.

To further enhance the accuracy of topic modelling I hope to extend the base model with domain-specific samples (say around 10 per class) and implemented few shot training with a new task. Following tutorial10 and the Bahasa sample on towardsdatascience I came up with a new TARS model. Here are my observations and questions:

  1. Should I create a new label_type of which all my samples belong and just train with that? From training output it doesn't seem to have a good result as it always complain about BAD EPOCHs and no improvement. Loading and testing with the resulting final-model.pt seem to lead all sentences to one of the class labels. With the ZS model it seems to make more intuitive sense as it can predict more than 1 label with different probabilities, and more importantly, sentence not related to the classes will not result in a prediction. In the case of newly trained FS model even garbage will be modelled to a class name.

  2. If the intention is to extend on the accuracy of the base TARS model is the above the correct approach? Or should a dataset be loaded and add my training samples to the corpus and re-train? If so which dataset is relevant for topic modelling and which label_type should be used?

  3. After the above training all the example coding tell us to use the predict() function. I wonder can predict_zero_shot() be used instead with the newly trained model? What is the difference is using one or the other function in this scenario?

  4. Intuitively to me Few Shot Learning seems to mean given an existing model (tars-base), we can just load it, create a corpus with a few of our domain-specific samples, re-train it and the resulting model will enhance accuracy for prediction when the sentence is related to that specific domain. Is that the correct understanding about FSL? Is Tutorial10 intended to demonstrate that kind of effect as described?

Much appreciated for any clarifications and assistance given.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.