Closed elliottlawrence closed 3 months ago
It makes sense, and I noticed you added support for downloading the correct Conda version using TARGETARCH in your PR :+1: . The reason why scikit-learn and some other repositories might not work out of the box in ARM64 is probably that I didn't add support for arch_specific_packages, which is used in the original SWE-bench evaluation to support ARM64.
However, I haven't gotten some of the scikit-learn benchmarks to work on AMD64 either. I'm working on a new approach where I don't use Conda at all to resolve this issue, which might help get this running on ARM64 as well.
I’ve skipped Conda in the scikit-learn images and runs them in one docker image per instance. So you can try to run the benchmark again and see if it works better on ARM64 now
Thanks for doing that, it seems like a promising solution.
I'm trying to test this out and running into some issues now.
build_docker_images.sh
, I get errors such as the following:
aorwall/swe-bench-sqlfluff_sqlfluff:bookworm-slim: failed to resolve source metadata for docker.io/aorwall/swe-bench-sqlfluff_sqlfluff:bookworm-slim: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Looking forward to trying out this new approach!
I did some renaming now. Will rebuild and push. But you can try to build again now and see if it works better.
You can also try to run ./scripts/build_docker_images.sh docker aorwall scikit-learn__scikit-learn
to only build the scikit-learn
images.
Testing it out now. I'm able to get it working with some tweaks. I'll send out a PR.
Question for you @aorwall: When I run the dockerfile generator I see a bunch of files change, specifically that more repos are switching from conda to pyenv. Should those changes be merged in?
I was hesitant to commit them because a handful of the builds aren't working on my machine, but I wasn't sure if they're already working on yours.
Not sure yet :sweat_smile: I changed so all repos that doesn't use an environment.yml
file is generated with the pyenv approach. But I didn't have time to verify all testbeds. So I ended up in a limbo state and skipped to merge the ones I didn't verify... As I guess some fixes is needed it might be better to just add a flag on psf__requests
, django__django
and scikit-learn__scikit-learn
to generate those with pyenv Dockerfiles. WDYT?
Got it, thanks for explaining. My basic thinking is that calling run_dockerfile_generator.py
in a clean repo shouldn't introduce new changes. So I think for now it makes sense to just generate the 3 repos you mentioned with pyenv, and then maybe the rest can be switched from conda as it's verified that they work.
I added a new property to set which environment to generated. Regenerated and pushed all docker images except for astropy/astropy which for some reason fails now
When running the build scripts on a Macbook, I'm running into an error because the base Dockerfile seems to assume an AMD64 environment (my machine is ARM64).
Potential solutions:
--platform=linux/amd64
to all thedocker build
commands if it's intended for all these images to just work on AMD64.Curious what your thoughts are on this.