Result of Localization - Githubissues

OpenAutoCoder / Agentless

Agentless🐱: an agentless approach to automatically solve software development problems

MIT License

745 stars 88 forks source link

Result of Localization #21

Open GCVulnerability opened 3 months ago

GCVulnerability commented 3 months ago

Hi, Agentless is an amazing work. I notice that '% Correct Location' is mentioned in the paper. I'm really interested in the Fault Location of SWE-bench. So can you please provide the ground truth of SWE-bench lite and the evaluation code?

brutalsavage commented 3 months ago

Thanks for the question, we will release that soon!

yorhaha commented 3 months ago

Thanks for the question, we will release that soon!

Could you please give an approximate release time? Otherwise, we will consider implementing our own evaluation code. But this may lead to differences in our results.

Thanks for your work.

yorhaha commented 3 months ago

Also, may I ask if you calculate recall value based on the final generated patch and the ground truth patch? (without considering the code retrieved during the intermediate process before generating the patch)

brutalsavage commented 3 months ago

Could you please give an approximate release time?

Our hope is sometime this week or early next week

calculate recall value

Not totally sure what you mean, can you please elaborate a bit more?

yorhaha commented 3 months ago

By recall value (used in SWE-bench paper), I want to mean "% Correct Location" in your paper. But after reading your paper carefully, now I think the two concepts are different.

Recall value measures the performance of RAG in SWE-bench paper. I am confused by the meaning of "% Correct Location", which encourages more code changes (to cover ground truth patch)?

brutalsavage commented 3 months ago

right so in our paper "% Correct Location" measure the percentage of time the patch edits the location as the groundtruth developer patch. We count it as the correct location if the patch edits a superset of all the locations. For example if its the function granularity, if a patch edits func1 and func2 but the groundtruth patch edits only func1 we still count it as correct. You can see Section 3 in the paper for more detail

yorhaha commented 3 months ago

Thanks for your explanation! I have got it.

UniverseFly commented 1 month ago

Any updates for the eval of fault location accuracy?