Reproducing Agentless-1.5 Results on SWE-bech lite

Thanks for improving Agentless. However, I can't reproduce the performance mentioned in the technical report based on the code you provided. When I generate the total files in 'repair_samles_1' - 'repair_samples_4' folders, I cannot generate 'all_preds.jsonl' file using all of 40 samples independently in the 4 folders. So, I merge and renamed the output sample files from ‘output_0_normalized.jsonl' to 'output_39_normalized.jsonl'. After merging, I run 'rerank.py' and generate 'all_preds.jsonl'.

Using gpt-4o-08-06 model and following the instructions in 'readme_swebench.md', I only got 26% pass rate (78/300) on SWE-Bench-lite. Moreover, even if I use all the intermediate results you provided in realese 1.5 and only run 'rerank.py', I can still only achieve a pass rate of 29.67% (89/300).

I was wondering if my use of 40 samples in 4 folders is incorrect? And how can I achieve 32% pass rate which you have submitted to SWE-bench through your intermediate results.

OpenAutoCoder / Agentless

Reproducing Agentless-1.5 Results on SWE-bech lite #39