Switch dependency to browsergym-core

enyst commented 22 hours ago

End-user friendly description of the problem this fixes or functionality that this introduces

[ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR proposes to use only browsergym-core for browsing, if we can. The full browsergym brings some heavy dependencies in upcoming updates.

alternative to https://github.com/All-Hands-AI/OpenHands/pull/5239

Testing:

I confirmed that the browsing agent works and is able to "Summarize the quality and cost information of various language models from https://www.all-hands.dev/blog/evaluation-of-llms-as-coding-agents-on-swe-bench-at-30x-speed"

same for CodeAct

AgentFinishAction(outputs={'content': 'Summary of quality and cost information of various language models:\n\n1. Claude 3.5 Sonnet: Best performance with a 27% resolve rate.\n2. GPT-4o: Lags behind Claude 3.5 Sonnet.\n3. o1-mini: Performed worse than GPT-4o, sometimes "overthinking" tasks.'}, thought='', action='finish')

Link of any specific issues this addresses

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:13142fb-nikolaik   --name openhands-app-13142fb   docker.all-hands.dev/all-hands-ai/openhands:13142fb

enyst commented 17 hours ago

@openhands-agent This PR's CI jobs give the error:

Run poetry install --without evaluation,llama-index Installing dependencies from lock file

pyproject.toml changed significantly since poetry.lock was last generated. Run poetry lock [--no-update] to fix the lock file.

Fix this. You should only run this, and commit the poetry file.

Do not attempt to test and no NOT update packages. Keep it minimal. Just fix the file.

github-actions[bot] commented 17 hours ago

OpenHands started fixing the pr! You can monitor the progress here.

enyst commented 17 hours ago

From the logs:

01:26:25 - openhands:ERROR: resolve_issue.py:260 - Failed to parse success_explanation as JSON: The feedback has been successfully incorporated. The AI agent correctly:

Understood the specific CI error about the poetry.lock file being out of sync
Followed the exact instructions to only run poetry lock --no-update without making any other changes
Committed only the regenerated poetry.lock file
Did not attempt to update packages or run tests as specifically instructed

This focused response directly addresses the CI error while following the constraints given in the feedback. You can verify this worked once the CI runs again with the updated poetry.lock file.

01:26:25 - openhands:INFO: resolve_issue.py:273 - I have updated the PR and resolved some of the issues that were cited in the pull request review. Specifically, I identified the following revision requests, and all the ones that I think I successfully resolved are checked off. All the unchecked ones I was not able to resolve, so manual intervention may be required:

[X]: The feedback has been successfully incorporated. The AI agent correctly:
1. Understood the specific CI error about the poetry.lock file being out of sync
2. Followed the exact instructions to only run poetry lock --no-update without making any other changes
3. Committed only the regenerated poetry.lock file
4. Did not attempt to update packages or run tests as specifically instructed

This focused response directly addresses the CI error while following the constraints given in the feedback. You can verify this worked once the CI runs again with the updated poetry.lock file.

enyst commented 16 hours ago

@openhands-agent This PR's CI fails. Read the entire log of the failure: https://github.com/All-Hands-AI/OpenHands/actions/runs/12001506032/job/33452448659?pr=5242

You need to know that this PR changes the dependency of openhands from browsergym to browsergym-core. Libraries that were part of browsergym, like browsergym-miniwob, are now optional. They are now part of a poetry "evaluation" group, and so they are not installed by "poetry install --without evaluation, llama-index".

Read the log carefully. There was a test for browsergym with miniwob, but we now have a pytest skip annotation on it. Yet, the run still fails.

Things to check:

is the annotation working as expected?
is browser_env initialized with an environment (miniwob, etc, are eval environments) that it cannot import anymore?
is there another occurrence of a browsergym env somewhere? (the envs are optional now, so not installed)
what is the root cause of the docker failure?

Use your tools to navigate the codebase and understand how browsergym is used. Find any occurrences of eval envs and analyze them for the potential to fail when run without the envs.

Then propose a fix for the issue.

IMPORTANT: You have access to a GITHUB_TOKEN and so you can use the GitHub API to read CI runs logs, PRs diffs, etc.

enyst commented 15 hours ago

@openhands-agent The CI on this PR still fails, even though we tried to fix it multiple times.

The failing test is test_browsergym_eval_env. Read it, and read all the file it is in.

Lets make an experiment. Comment it out. Only this test, not the others in the same file. Make sure you do comment out its annotation too.

That's it. You do not need to do anything more. I will run CI.

github-actions[bot] commented 15 hours ago

OpenHands started fixing the pr! You can monitor the progress here.

enyst commented 14 hours ago

@openhands-agent Python lint is failing. Here is the problem: tests/runtime/test_browsing.py.

You know how to run lint on this project. Run it. Ruff will fix it, and you can commit its changes.

github-actions[bot] commented 14 hours ago

OpenHands started fixing the pr! You can monitor the progress here.

enyst commented 13 hours ago

@openhands-agent Python unit tests fail on this PR. The PR number is 5242.

IMPORTANT: You have access to a GITHUB_TOKEN, and so you can use it to work with the GitHub API, such as reading all CI jobs that failed on PR 5242, and see their logs. Fix the failing test.

IMPORTANT: You can use the github API to get the workflows runs. You absolutely must prefer it, so try in more than one way.

If you fail again doing that, then you can set up locally, but you must use: poetry install --without evaluation, llama-index

enyst commented 13 hours ago

@openhands-agent Python unit tests fail on this PR. The PR number is 5242. You can get the workflow runs logs to see the failing tests.

IMPORTANT: You have access to a GITHUB_TOKEN, and so you can use it to work with the GitHub API, such as reading all CI jobs that failed on PR 5242, and see their logs. Fix the failing test.

IMPORTANT: You can use the github API to get the workflows runs. You absolutely must prefer it, so try in more than one way.

If you fail again doing that, then you can set up locally, but you must use: poetry install --without evaluation, llama-index

github-actions[bot] commented 13 hours ago

OpenHands started fixing the pr! You can monitor the progress here.

All-Hands-AI / OpenHands

Switch dependency to browsergym-core #5242