Codium-ai / cover-agent

CodiumAI Cover-Agent: An AI-Powered Tool for Automated Test Generation and Code Coverage Enhancement! 💻🤖🧪🐞
https://www.codium.ai/
GNU Affero General Public License v3.0
4.29k stars 311 forks source link

The test cases generated by using open-source models like qwen in actual projects can't even compile #187

Open red-velet opened 2 weeks ago

red-velet commented 2 weeks ago

After integrating cover-agent into the business code, many test cases generated by LLMs fail to pass the tests. This issue is so frequent that I want to seek help. The log often displays: "Skipping a generated test that failed": image ,and there are also many failures in the test_result.html: image

The command I executed is as follows:

./cover-agent \
--model "openai/Qwen2-72B-Instruct-GPTQ-Int4" \
--api-base "http://vllm.auto.alibabaev.com/v1" \
--source-file-path "appcompat/src/main/java/micarx/appcompat/app/AlertDialog.java" \
--included-files "test/src/main/java/com/micarx/test/TestActivity.java" "internal/src/main/java/micarx/internal/ReflectionUtils.java" "appcompat/src/main/java/micarx/appcompat/app/DialogAnimController.java" "appcompat/src/main/java/micarx/appcompat/app/DialogAlertController.java" "appcompat/src/main/java/micarx/appcompat/app/KeyBoardTranslation.java" "appcompat/src/main/java/micarx/appcompat/app/AnimationCallBack.java" \
--test-file-path "test/src/test/java/com/micarx/test/segment/AlertDialogTest.java" \
--code-coverage-report-path "test/build/reports/jacoco/jacocoTestReport/jacocoTestReport.csv" \
--test-command "./gradlew clean test jacocoTestReport" \
--test-command-dir $(pwd) \
--coverage-type "jacoco" \
--desired-coverage 90 \
--max-iterations 10

The test coverage report is as follows: I executed this command twice, and it iterated a total of 20 times, but the coverage rate only increased by 10%.

EmbeddedDevops1 commented 2 days ago

@red-velet Thanks for filing the issue. Have you tried this with GPT-4o first just to ensure that everything is working correctly? We haven't seen much success with lower end models so we always encourage users to try GPT-4o (or models of similar capabilities) first.