[Refactor]: Refactor the evaluation directory

openhands-agent commented 2 days ago

What problem or use case are you trying to solve?

Right now in the evaluation directory, the directory structure is very flat, and it is hard to tell which subdirectories are utilities related to implementing benchmarks or doing basic tests for openhands (utils, integration_tests, regression, static), and which are actual benchmarks from the ML literature (everything else).

To make this more clear, we can move all benchmarks to live under the evaluation/benchmarks/ directory. In addition, all other files that have to do with evaluation (including documentation, github workflows, etc.) will need to be checked and changed to maintain consistency.

While we do this, we can also add some of the benchmarks that are missing from the evaluation/README.md documentation.

neubig commented 2 days ago

Sorry, to clarify, this was me accidentally logged in as the openhands-agent account...

github-actions[bot] commented 2 days ago

OpenHands started fixing the issue! You can monitor the progress here.

github-actions[bot] commented 2 days ago

A potential fix has been generated and a draft PR #5223 has been created. Please review the changes.

All-Hands-AI / OpenHands

[Refactor]: Refactor the evaluation directory #5222