OpenAutoCoder / Agentless

Agentless🐱: an agentless approach to automatically solve software development problems
MIT License
663 stars 71 forks source link

Result of Localization #21

Open GCVulnerability opened 1 month ago

GCVulnerability commented 1 month ago

Hi, Agentless is an amazing work. I notice that '% Correct Location' is mentioned in the paper. I'm really interested in the Fault Location of SWE-bench. So can you please provide the ground truth of SWE-bench lite and the evaluation code?

brutalsavage commented 1 month ago

Thanks for the question, we will release that soon!

yorhaha commented 3 weeks ago

Thanks for the question, we will release that soon!

Could you please give an approximate release time? Otherwise, we will consider implementing our own evaluation code. But this may lead to differences in our results.

Thanks for your work.

yorhaha commented 3 weeks ago

Also, may I ask if you calculate recall value based on the final generated patch and the ground truth patch? (without considering the code retrieved during the intermediate process before generating the patch)

brutalsavage commented 3 weeks ago

Could you please give an approximate release time?

Our hope is sometime this week or early next week

calculate recall value

Not totally sure what you mean, can you please elaborate a bit more?

yorhaha commented 3 weeks ago

By recall value (used in SWE-bench paper), I want to mean "% Correct Location" in your paper. But after reading your paper carefully, now I think the two concepts are different.

Recall value measures the performance of RAG in SWE-bench paper. I am confused by the meaning of "% Correct Location", which encourages more code changes (to cover ground truth patch)?

brutalsavage commented 3 weeks ago

right so in our paper "% Correct Location" measure the percentage of time the patch edits the location as the groundtruth developer patch. We count it as the correct location if the patch edits a superset of all the locations. For example if its the function granularity, if a patch edits func1 and func2 but the groundtruth patch edits only func1 we still count it as correct. You can see Section 3 in the paper for more detail

yorhaha commented 3 weeks ago

Thanks for your explanation! I have got it.