ZiyueWang25 / intercode

[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
https://intercode-benchmark.github.io/
MIT License
0 stars 0 forks source link

[Data] Gather easy tasks to measure the early success of the agent #11

Closed ZiyueWang25 closed 8 months ago

ZiyueWang25 commented 8 months ago

Is your feature request related to a problem? Please describe. The current SWE-bench is probably too difficult to see the effect of scaffolding. We can gather ~20 easy tasks within them to measure the progress of our scaffolding/prompting effect.

Describe the solution you'd like We can use the # of lines in the golden patch as a criteria for difficulty at this moment.