feat: adding swe-bench docker to improve evaluation

shubhras01 commented 1 week ago

adds swe-bench-docker repo code to improve and run evaluation on docker tasks:
[ ] add evaluation function as part of run_eval script
[ ] build docker images and push it to public docker repo
[ ] use the same docker image to run composio-swe

ellipsis-dev[bot] commented 1 week ago

Your free trial has expired. To keep using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month or reach us at help@ellipsis.dev

codiumai-pr-agent-pro[bot] commented 5 days ago

CI Failure Feedback 🧐

(Checks updated until commit https://github.com/ComposioHQ/composio/commit/6ba478d8dccf76ab13d090dfb1ad35f595c0a7a4)

**Action:** test (ubuntu-latest, 3.11)

**Failed stage:** [Unittests](https://github.com/ComposioHQ/composio/actions/runs/9796582850/job/27051411831) [❌]

**Failed test name:** composio/tools/local/shelltool/tests/test_workspace.py

**Failure summary:** The action failed because there were import errors in the test file
composio/tools/local/shelltool/tests/test_workspace.py.

The specific error was ImportError: cannot import name 'ExecutionEnvironment' from
'composio.tools.env.factory'.

This indicates that the ExecutionEnvironment class or function is missing or incorrectly named in
the composio.tools.env.factory module.

Relevant error logs:

```yaml 1: ##[group]Operating System 2: Ubuntu ... 495: * [new branch] featembed-tool -> origin/featembed-tool 496: * [new branch] fix/readme -> origin/fix/readme 497: * [new branch] fix/readme-logo -> origin/fix/readme-logo 498: * [new branch] fix/swe-agent -> origin/fix/swe-agent 499: * [new branch] ft-add-better-help-text -> origin/ft-add-better-help-text 500: * [new branch] ft-apps-id -> origin/ft-apps-id 501: * [new branch] ft-bring-back-core-sdk -> origin/ft-bring-back-core-sdk 502: * [new branch] ft-did-you-mean -> origin/ft-did-you-mean 503: * [new branch] ft-error-tracking -> origin/ft-error-tracking ... 877: ✔ Actions updated 878: ⚠️ Triggers does not require update 879: unittests: commands[1]> pytest -vvv -rfE --doctest-modules composio/ tests/ --cov=composio --cov=examples --cov-report=html --cov-report=xml --cov-report=term --cov-report=term-missing --cov-config=.coveragerc 880: ============================= test session starts ============================== 881: platform linux -- Python 3.11.9, pytest-7.4.2, pluggy-1.5.0 -- /home/runner/work/composio/composio/python/.tox/unittests/bin/python 882: cachedir: .tox/unittests/.pytest_cache 883: rootdir: /home/runner/work/composio/composio/python 884: plugins: codecov-0.5.1, anyio-4.4.0, cov-5.0.0 885: collecting ... collected 44 items / 2 errors 886: ==================================== ERRORS ==================================== 887: ___ ERROR collecting composio/tools/local/shelltool/tests/test_workspace.py ____ ... 902: :1147: in _find_and_load_unlocked 903: ??? 904: :690: in _load_unlocked 905: ??? 906: .tox/unittests/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module 907: exec(co, module.__dict__) 908: composio/tools/local/shelltool/tests/test_workspace.py:6: in 909: from composio.tools.env.factory import ExecutionEnvironment, WorkspaceFactory 910: E ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py) 911: ___ ERROR collecting composio/tools/local/shelltool/tests/test_workspace.py ____ 912: ImportError while importing test module '/home/runner/work/composio/composio/python/composio/tools/local/shelltool/tests/test_workspace.py'. ... 925: :1147: in _find_and_load_unlocked 926: ??? 927: :690: in _load_unlocked 928: ??? 929: .tox/unittests/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module 930: exec(co, module.__dict__) 931: composio/tools/local/shelltool/tests/test_workspace.py:6: in 932: from composio.tools.env.factory import ExecutionEnvironment, WorkspaceFactory 933: E ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py) ... 1077: composio/utils/shared.py 117 104 11% 43-83, 99-108, 139-143, 153-158, 174-221, 247-292, 324-337 1078: composio/utils/url.py 10 6 40% 19, 24-35 1079: examples/crewai_ci_chart.py 15 15 0% 1-38 1080: -------------------------------------------------------------------------------------------------------------- 1081: TOTAL 7829 1708 78% 1082: Coverage HTML written to dir htmlcov 1083: Coverage XML written to file coverage.xml 1084: =========================== short test summary info ============================ 1085: ERROR composio/tools/local/shelltool/tests/test_workspace.py - ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py) 1086: ERROR composio/tools/local/shelltool/tests/test_workspace.py 1087: !!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!! 1088: ========================= 1 warning, 2 errors in 4.26s ========================= 1089: unittests: exit 2 (5.11 seconds) /home/runner/work/composio/composio/python> pytest -vvv -rfE --doctest-modules composio/ tests/ --cov=composio --cov=examples --cov-report=html --cov-report=xml --cov-report=term --cov-report=term-missing --cov-config=.coveragerc pid=5655 1090: .pkg: _exit> python /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__ 1091: unittests: FAIL code 2 (27.00=setup[18.18]+cmd[3.72,5.11] seconds) 1092: evaluation failed :( (27.14 seconds) 1093: ##[error]Process completed with exit code 2. ```

✨ CI feedback usage guide:

The CI feedback tool (`/checks)` automatically triggers when a PR has a failed check. The tool analyzes the failed checks and provides several feedbacks: - Failed stage - Failed test name - Failure summary - Relevant error logs In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR: ``` /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}" ``` where `{repo_name}` is the name of the repository, `{run_number}` is the run number of the failed check, and `{job_number}` is the job number of the failed check. #### Configuration options - `enable_auto_checks_feedback` - if set to true, the tool will automatically provide feedback when a check is failed. Default is true. - `excluded_checks_list` - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list. - `enable_help_text` - if set to true, the tool will provide a help message with the feedback. Default is true. - `persistent_comment` - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true. - `final_update_message` - if `persistent_comment` is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true. See more information about the `checks` tool in the [docs](https://pr-agent-docs.codium.ai/tools/ci_feedback/).

ComposioHQ / composio

feat: adding swe-bench docker to improve evaluation #246

CI Failure Feedback 🧐

(Checks updated until commit https://github.com/ComposioHQ/composio/commit/6ba478d8dccf76ab13d090dfb1ad35f595c0a7a4)