fani-lab / LADy

LADy 💃: A Benchmark Toolkit for Latent Aspect Detection Enriched with Backtranslation Augmentation
Other
4 stars 4 forks source link

Dataset Creation using LLMs! #96

Open Sepideh-Ahmadian opened 3 days ago

Sepideh-Ahmadian commented 3 days ago

We’re so happy to have you on board with the LADy project, Calder! We use the issue pages for many purposes, but we really enjoy noting good articles and our findings on every aspect of the project.

We can use this issue page to compile all our findings about LLMs for data generation. A great article to start with is "On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey", which you can also find in the team’s article repository.

The key questions we’re exploring are: Which language models perform best in data creation (considering the domain and the task at hand), and what are their advantages and disadvantages? As you go through the suggested paper and similar ones, feel free to add and suggest articles in both the Google Doc and here.

Once we've covered the research, we’ll dive into Q1, as mentioned by Hossein in today’s session, where we’ll test the LLMs on our gathered dataset.

If you have any questions, feel free to ask here and mention either me or Hossein!

CalderJohnson commented 2 days ago

Sounds great! I'll delve into the literature you've found as well as any other papers that catch my eye and summarize their relevant points to our goal in the google doc.