Closed kaavee315 closed 3 days ago
Your free trial has expired. To keep using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month or reach us at help@ellipsis.dev
โฑ๏ธ Estimated effort to review: 4 ๐ต๐ต๐ต๐ตโช |
๐งช No relevant tests |
๐ No security concerns identified |
โก Key issues to review **Refactoring Concern:** The removal of `workspace_id` and `workspace_env` from the constructor in `toolset.py` and the addition of `set_workspace_id` might introduce issues if the workspace ID is used before being set in scenarios not covered by the PR. **Error Handling:** The new error handling and logging enhancements in `run_evaluation.py` need to be carefully reviewed to ensure that they correctly capture and log errors without missing any exceptions. **Functionality Change:** The change from `SHELL_EXECUTE_COMMAND` to `SHELL_EXEC_COMMAND` in `run_evaluation.py` could potentially alter the behavior of command execution, depending on the implementation details of these actions. |
Category | Suggestion | Score |
Possible issue |
Add a check to ensure
___
**Add a check to ensure | 8 |
Add error handling to catch and log exceptions during the
___
**Add error handling in the | 7 | |
Enhancement |
Add a type hint for the
___
**Add a type hint for the | 7 |
Best practice |
Use the
___
**Instead of checking | 6 |
**Action:** test (ubuntu-latest, 3.10) |
**Failed stage:** [Unittests](https://github.com/ComposioHQ/composio/actions/runs/9818250580/job/27110567188) [โ] |
**Failed test name:** test_list_all |
**Failure summary:**
The action failed because the test test_list_all in the file tests/test_cli/test_actions.py failed. The test failed due to an AssertionError indicating a server error response:0 , but the actual exit code was 1 .that the application failed to respond. |
Relevant error logs:```yaml 1: ##[group]Operating System 2: Ubuntu ... 496: * [new branch] featembed-tool -> origin/featembed-tool 497: * [new branch] fix/readme -> origin/fix/readme 498: * [new branch] fix/readme-logo -> origin/fix/readme-logo 499: * [new branch] fix/swe-agent -> origin/fix/swe-agent 500: * [new branch] ft-add-better-help-text -> origin/ft-add-better-help-text 501: * [new branch] ft-apps-id -> origin/ft-apps-id 502: * [new branch] ft-bring-back-core-sdk -> origin/ft-bring-back-core-sdk 503: * [new branch] ft-did-you-mean -> origin/ft-did-you-mean 504: * [new branch] ft-error-tracking -> origin/ft-error-tracking ... 892: tests/test_example.py::test_example[example0] SKIPPED (Testing in CI 893: will lead to too much LLM API usage) [ 4%] 894: tests/test_example.py::test_example[example1] SKIPPED (Testing in CI 895: will lead to too much LLM API usage) [ 6%] 896: tests/test_example.py::test_example[example2] SKIPPED (Testing in CI 897: will lead to too much LLM API usage) [ 9%] 898: tests/test_cli/test_actions.py::TestListActions::test_list_all[arguments0-exptected_outputs0-unexptected_outputs0] PASSED [ 11%] 899: tests/test_cli/test_actions.py::TestListActions::test_list_all[arguments1-exptected_outputs1-unexptected_outputs1] PASSED [ 13%] 900: tests/test_cli/test_actions.py::TestListActions::test_list_all[arguments2-exptected_outputs2-unexptected_outputs2] FAILED [ 15%] 901: tests/test_cli/test_actions.py::TestListActions::test_list_all[arguments3-exptected_outputs3-unexptected_outputs3] PASSED [ 18%] 902: tests/test_cli/test_actions.py::TestListActions::test_tag_not_found PASSED [ 20%] 903: tests/test_cli/test_actions.py::TestListActions::test_limit SKIPPED [ 22%] 904: tests/test_cli/test_actions.py::TestListActions::test_copy PASSED [ 25%] 905: tests/test_cli/test_add.py::TestComposioAdd::test_no_auth PASSED [ 27%] 906: tests/test_cli/test_apps.py::TestList::test_list PASSED [ 29%] 907: tests/test_cli/test_apps.py::TestUpdate::test_update_not_required PASSED [ 31%] 908: tests/test_cli/test_apps.py::TestUpdate::test_update SKIPPED (Needs 909: investigation, this test fails in CI) [ 34%] ... 931: tests/test_tools/test_toolset.py::test_find_actions_by_tags PASSED [ 84%] 932: tests/test_tools/test_toolset.py::test_uninitialize_app PASSED [ 86%] 933: tests/test_utils/test_decorators.py::test_deprecated PASSED [ 88%] 934: tests/test_utils/test_git.py::test_get_git_user_info PASSED [ 90%] 935: tests/test_utils/test_shared.py::test_get_pydantic_signature_format_from_schema_params PASSED [ 93%] 936: tests/test_utils/test_shared.py::test_json_schema_to_pydantic_field PASSED [ 95%] 937: tests/test_utils/test_shared.py::test_json_schema_to_fields_dict PASSED [ 97%] 938: tests/test_utils/test_url.py::test_get_web_url PASSED [100%] 939: =================================== FAILURES =================================== ... 965: patch: t.Any, 966: arguments: t.Tuple[str, ...], 967: exptected_outputs: t.Tuple[str, ...], 968: unexptected_outputs: t.Tuple[str, ...], 969: ) -> None: 970: """Test list all actions.""" 971: result = self.run("actions", *arguments) 972: > assert result.exit_code == 0, result.stderr 973: E AssertionError: Error: 974: E 975: E 976: E 977: E 978: ENothing here... yet1064: EApplication failed to respond... 1066: E Go to Railway 1067: E 1068: E 1069: E 1070: E 1071: E 1072: E assert 1 == 0 1073: E + where 1 =Nothing here... yet1307:Application failed to respond1308: Go to Railway 1309: 1310: 1311: 1312: assert 1 == 0 1313: + where 1 = |
PR Type
Enhancement, Tests
Description
ComposioToolSet
to improve workspace handling and logging.run_and_get_scores
function for running agents and retrieving scores.run
function inrun_evaluation.py
to accept anagent_func
parameter.run_evaluation.py
.benchmark.template
file.run_benchmark.template
for running benchmark evaluations.Changes walkthrough ๐
toolset.py
Refactor workspace handling and logging in `ComposioToolSet`
python/composio/tools/toolset.py
workspace_id
andworkspace_env
attributes from theconstructor.
set_workspace_id
method to set and retrieve workspace.run_evaluation.py
Refactor benchmark evaluation and logging setup
python/swe/benchmark/run_evaluation.py
run_and_get_scores
function to run agent and get scores.run
function to acceptagent_func
as a parameter.get_logger
.benchmark.template
Remove obsolete benchmark template
python/swe/composio_swe/scaffold/templates/crewai/benchmark.template - Removed the `benchmark.template` file.
run_benchmark.template
Add new benchmark template for running evaluations
python/swe/composio_swe/scaffold/templates/crewai/run_benchmark.template
run_benchmark.template
file.agent_func
for workspace setup and issue kickoff.main
function to run benchmark.