Closed yuntongzhang closed 5 months ago
Thank you for the questions!
For CodeR experiments, we do not use Plan B and Plan D. Note that Plan D is suitable to be used in real production deployment as users could provide ground truth test cases for their own raised issues. In our paper, we just metioned Plan B and Pland D for more possiblities. We will clarify this in the next version of arXiv.
For RAG, we use title+description
to retrieve similar issues according to the similarity of embedding. We created and maintained an issue database (crawled all issues) for the 12 resopitories involved in SWE-bench lite. When returning the top 1, we filtered the issues with PR whose time stamp are later than current issue. Unfortanately, we found that retrieved issues do not help much in our results. In future, it is insteresting to see how similar issues could help more. The major contributions of CodeR are from multi-agent
, task graph
and fault localization
.
Thanks again!
Thank you so much for your detailed clarification! It would be very interesting to see how past issues can help in the future.
Congratulations on the release and thank you for citing the AutoCodeRover paper!
The paper was a great read. After going through it, I have a couple of clarification questions:
The Plan D part above Figure 3 mentions that "Plan D takes a test-driven approach with a ground truth test for issues (such as 'fail-to-pass' and 'pass-to-pass' tests in SWE-bench)." Does this mean the developer-written tests for the issue (i.e. the
test_patch
field in SWE-bench instances) were provided to CodeR?Section 2 mentions that "Action 18 retrieves the top-1 similar issue and its corresponding patch by description." (action 18 is
related issue retrieval
from Table 1). This is an interesting approach! I'm curious how you defined "similarity" between issues - was this using a RAG-based approach on the issue descriptions? Besides, how did you construct the corpus of issues to retrieve from?Thank you very much in advance for your time and assistance!