flairNLP / fabricator

[EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.
Apache License 2.0
98 stars 12 forks source link

Idea on how to structure generation / annotation #68

Open whoisjones opened 11 months ago

whoisjones commented 11 months ago

Instead of having a unified generation function as we have now, we might want to adjust our repo in the future in a direction such that users can pick different approaches like:

For Generation: ZEROGEN: efficient zero-shot learning via dataset generation (paper) PROGEN: progressive dataset generation via in-context feedback (paper)

For Annotation: CALIBRATION: prompt-based zero-shot learning with calibration (paper) ...

At last, we should keep the possibility to generate datasets on their own, defining their own sampling strategy, sample information criterion, etc.

HallerPatrick commented 11 months ago

I think a good idea would be to start implementing one or two of those approaches and see, what needs to be changed. We could then abstract that to be useable with the other approaches. I am interested in implementing one of those. You already have one in mind that is open to implement?