Make the baseline example work with inferior models for code

mrT23 commented 1 month ago

This PR enhance the flow engineering so that the baseline example works with the open-source models, which are dramatically less effective for code compared to GPT-4. We hope that this will improve and stabilize the entire flow.

specifically, we tested following open-source models on the baseline example:

ollama/llama3
ollama/lama3:70b
ollama/codeqwen:v1.5

PR Type

enhancement, tests

Description

Added build_prompt_custom method to PromptBuilder for custom prompt building with error handling.
Enhanced UnitTestGenerator with initial_test_suite_analysis method and improved validate_test method with better error handling and rollback mechanisms.
Added initial test suite analysis call in the main function.
Updated configuration loader to include new prompt settings files.
Improved YAML loading in utils.py with additional logging and error handling.
Added new prompt configurations for analyzing test headers indentation and determining test insert line.
Updated test generation prompt template in test_generation_prompt.toml.

Changes walkthrough 📝

Relevant files

Enhancement

PromptBuilder.py `Add custom prompt building method to PromptBuilder` cover_agent/PromptBuilder.py Added `build_prompt_custom` method to `PromptBuilder` class. Introduced custom prompt building with error handling.	+30/-0
UnitTestGenerator.py `Enhance UnitTestGenerator with custom prompts and error handling` cover_agent/UnitTestGenerator.py Integrated `build_prompt_custom` method in `UnitTestGenerator`. Added `initial_test_suite_analysis` method for analyzing test suite. Enhanced `validate_test` method with error handling and rollback.	+160/-106
main.py `Add initial test suite analysis in main function` cover_agent/main.py - Added initial test suite analysis call in `main` function.	+4/-0
utils.py `Improve YAML loading with enhanced error handling` cover_agent/utils.py - Enhanced YAML loading with additional logging and error handling.	+4/-2
test_generation_prompt.toml `Update test generation prompt template` cover_agent/settings/test_generation_prompt.toml - Updated prompt template for test generation.	+2/-6

Configuration changes

config_loader.py `Update config loader with new prompt settings files` cover_agent/settings/config_loader.py - Added new configuration files for custom prompt settings.	+2/-1
analyze_suite_test_headers_indentation.toml `Add configuration for analyzing test headers indentation` cover_agent/settings/analyze_suite_test_headers_indentation.toml Added new prompt configuration for analyzing test headers indentation.	+41/-0
analyze_suite_test_insert_line.toml `Add configuration for determining test insert line` cover_agent/settings/analyze_suite_test_insert_line.toml - Added new prompt configuration for determining test insert line.	+37/-0

💡 PR-Agent usage: Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

codiumai-pr-agent-pro[bot] commented 1 month ago

PR Review 🔍

⏱️ Estimated effort to review [1-5]	4, because the PR involves multiple complex enhancements across several files, including new methods for prompt building and test suite analysis, updates to the test validation logic, and additional configuration settings. The changes are extensive and touch critical parts of the codebase, requiring a thorough review to ensure functionality and integration.
🧪 Relevant tests	Yes
⚡ Possible issues	Exception Handling: The use of generic exceptions and the same error message across different parts of the code could make debugging difficult. Specific exceptions should be used where possible.
⚡ Possible issues	Error Propagation: In several places, the code catches exceptions and logs them, but then raises a generic error or the same exception without additional context, which might obscure the original error source.
🔒 Security concerns	No

mrT23 commented 1 month ago

/improve

codiumai-pr-agent-pro[bot] commented 1 month ago

PR Code Suggestions ✨

Category	Suggestion	Score
Possible issue	Add a check to ensure the settings for the specified file exist before accessing them ___ To avoid potential issues with missing or incorrect settings, add a check to ensure that the `file` key exists in the settings before attempting to access it. [cover_agent/PromptBuilder.py [173-178]](https://github.com/Codium-ai/cover-agent/pull/60/files#diff-1602223755506d62450bdc4cea9140b0a11ccf93f7cd00151970cd1d649786adR173-R178) ```diff +settings = get_settings().get(file) +if not settings: + logging.error(f"Settings for file '{file}' not found.") + return {"system": "", "user": ""} system_prompt = environment.from_string( - get_settings().get(file).system + settings.system ).render(variables) user_prompt = environment.from_string( - get_settings().get(file).user + settings.user ).render(variables) ``` Suggestion importance[1-10]: 8 Why: This suggestion correctly identifies a potential issue where the settings might not exist for a given file, which could lead to runtime errors. Adding a check enhances robustness and error handling.	8
Edge case	Add test cases for handling `None` and empty string inputs in `extract_error_message_python` ___ Add a test case for `extract_error_message_python` to handle cases where the input is `None` or an empty string. This will ensure the function handles edge cases gracefully. [tests/test_UnitTestGenerator.py [117-121]](https://github.com/Codium-ai/cover-agent/pull/60/files#diff-6563fc9db12b41fab8c40e7392e78047264cc44e6ed59a95ab1eccbecb84d8adR117-R121) ```diff def test_extract_bad_match(self): fail_message = 33 expected = '' result = extract_error_message_python(fail_message) assert result == expected, f"Expected '{expected}', got '{result}'" +def test_extract_none_input(self): + fail_message = None + expected = '' + result = extract_error_message_python(fail_message) + assert result == expected, f"Expected '{expected}', got '{result}'" + +def test_extract_empty_string(self): + fail_message = '' + expected = '' + result = extract_error_message_python(fail_message) + assert result == expected, f"Expected '{expected}', got '{result}'" + ``` Suggestion importance[1-10]: 8 Why: Adding test cases for `None` and empty strings ensures robustness by covering edge cases. The suggestion is accurate and targets the new test class effectively.	8
Maintainability	Extract repeated logic for building and parsing prompts into a separate method ___ To improve readability and maintainability, extract the repeated logic for building prompts into a separate method. [cover_agent/UnitTestGenerator.py [243-248]](https://github.com/Codium-ai/cover-agent/pull/60/files#diff-19760582d9ede3a799fdbb541ad357b4822682e837bca8365196fba50daf57e3R243-R248) ```diff -prompt_test_headers_indentation = self.prompt_builder.build_prompt_custom(file="analyze_suite_test_headers_indentation") -response, prompt_token_count, response_token_count = ( - self.ai_caller.call_model(prompt=prompt_test_headers_indentation) -) -tests_dict = load_yaml(response) +def build_and_parse_prompt(self, file): + prompt = self.prompt_builder.build_prompt_custom(file=file) + response, prompt_token_count, response_token_count = self.ai_caller.call_model(prompt=prompt) + return load_yaml(response) + +tests_dict = self.build_and_parse_prompt("analyze_suite_test_headers_indentation") test_headers_indentation = tests_dict.get('test_headers_indentation', None) ``` Suggestion importance[1-10]: 7 Why: The suggestion to extract repeated logic into a separate method is valid and improves maintainability and readability of the code. This is a good practice in software development.	7
Maintainability	Use constants or configuration variables instead of hardcoding values ___ Instead of hardcoding the values `10` and `4` for `relevant_line_number_to_insert_after` and `test_headers_indentation`, consider using constants or configuration variables. This will make the code more maintainable and easier to update in the future. [tests/test_UnitTestGenerator.py [38-39]](https://github.com/Codium-ai/cover-agent/pull/60/files#diff-6563fc9db12b41fab8c40e7392e78047264cc44e6ed59a95ab1eccbecb84d8adR38-R39) ```diff -test_gen.relevant_line_number_to_insert_after = 10 -test_gen.test_headers_indentation = 4 +RELEVANT_LINE_NUMBER = 10 +TEST_HEADERS_INDENTATION = 4 +test_gen.relevant_line_number_to_insert_after = RELEVANT_LINE_NUMBER +test_gen.test_headers_indentation = TEST_HEADERS_INDENTATION ``` Suggestion importance[1-10]: 7 Why: Using constants or configuration variables improves maintainability and makes future updates easier. The suggestion is relevant and correctly identifies the lines of code.	7
Validation	Validate that the `--max-iterations` argument is a positive integer ___ Validate that the `--max-iterations` argument is a positive integer. This will prevent potential runtime errors due to invalid input values. [tests/test_main.py [22-23]](https://github.com/Codium-ai/cover-agent/pull/60/files#diff-41d180f6f7233d175c2dfe36c45c4ea80eed4754fa7ed4eed46ccde1b2a778c7R22-R23) ```diff "--max-iterations", "10", +assert int(args.max_iterations) > 0, "max-iterations must be a positive integer" ``` Suggestion importance[1-10]: 6 Why: Validating input arguments is crucial for preventing runtime errors. The suggestion is correct and targets the new code related to command-line arguments.	6
Best practice	Log the exception message when the second fallback for parsing YAML fails ___ To improve error handling, log the exception message when the second fallback for parsing YAML fails. [cover_agent/utils.py [76-77]](https://github.com/Codium-ai/cover-agent/pull/60/files#diff-4b68afa4ecc709b261a42ca40bfe986f4d150bf47184bdecdadc4954d434dcb0R76-R77) ```diff +logging.info(f"Failed to parse AI prediction after extracting YAML snippet.") pass ``` Suggestion importance[1-10]: 5 Why: Adding logging for the failure in the second fallback for parsing YAML is a good practice for better error tracking and debugging. However, the impact is moderate as it mainly aids in debugging rather than preventing or fixing a functional issue.	5
Clarity	Include an example of the expected YAML output format in the configuration file ___ Include an example of the expected YAML output format in the `analyze_suite_test_headers_indentation.toml` file. This will provide clarity on the expected response format. [cover_agent/settings/analyze_suite_test_headers_indentation.toml [37-41]](https://github.com/Codium-ai/cover-agent/pull/60/files#diff-013ef5abd80bc3034a2c1e0be40c907c39f5436c2177809c1206da1f855ae9cdR37-R41) ```diff The Response should be only a valid YAML object, without any introduction text or follow-up text. +Example output: +```yaml +language: python +testing_framework: pytest +number_of_tests: 5 +test_headers_indentation: 4 +``` + ``` Suggestion importance[1-10]: 5 Why: Including an example output enhances clarity and understanding of the expected format. The suggestion is relevant and correctly identifies the need for examples in the configuration file.	5

Codium-ai / cover-agent