aifoundry-org / blame-ai-discord-bot

Apache License 2.0
0 stars 1 forks source link

Choose a number of PRs, that we can pull and create manual samples #3

Open khasinski opened 1 month ago

khasinski commented 1 month ago

Gather good samples from different Repos (we need to have different Repos to make samples more representative)

llama.cpp, LLamagator, Rails

Consider Public and Private repos as well.

What would be the criteria to select Repos and PRs from those repos

Manually annotate those PR so we can turn it into a Dataset

The goal is to have 40 examples with diverse set of repositories. At the moment we are mostly missing smaller sized projects (in terms of the codebase size). We will publish CSV file to hugging face.

This is just a list of selected repositories that we will operate on during initial development, this ticket isn't about fetching the data from those PRs.

UPD: Link to PR Gathering Doc: https://docs.google.com/spreadsheets/d/1ELqRlo27bOSUwc3Dy77G3-V5fQy2NRgO2uKKeo6a7-E/edit?usp=sharing

antonekko commented 4 weeks ago

added more PRs to the list

Image

robolamp commented 3 weeks ago

Done https://huggingface.co/datasets/aifoundry-org/BlameAIData-0.1