Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
347 stars 44 forks source link

Binder launch is broken because install requires NVIDIA GPUs #181

Closed weiji14 closed 3 months ago

weiji14 commented 6 months ago

Originally posted by @ritwikvashistha in https://github.com/Clay-foundation/model/issues/161#issuecomment-2002602847

Hi, I tried using the model on my Mac M1 Pro as well as on Binder. But I ran into the same issue with both of them. When I was trying to install locally, I got an error "The following package could not be------installed: pytorch ~=2.1.0 cuda12 does not exist (perhaps a typo or a missing channel)." With Binder, I got the same error as it was loading. I would appreciate any suggestions on how to resolve this issue.

The error I get when launching Binder at https://mybinder.org/v2/gh/Clay-foundation/model/main is:

Step 39/50 : RUN TIMEFORMAT='time: %3R' bash -c 'time ${MAMBA_EXE} env update -p ${NB_PYTHON_PREFIX} --file "environment.yml" && time ${MAMBA_EXE} clean --all -f -y && ${MAMBA_EXE} list -p ${NB_PYTHON_PREFIX} '
 ---> Running in 05506e6a3c50

EnvironmentSectionNotValid: The following section on '/home/jovyan/environment.yml' is invalid and will be ignored:
 - platforms

Looking for: ['conda-lock~=2.5.1', 'einops~=0.7.0', 'fiona~=1.9.5', 'geopandas-base~=0.14.1', 'h5netcdf~=1.3.0', 'jupyter-book~=1.0.0', 'jupyterlab~=4.0.7', 'jsonargparse~=4.27.0', 'lightning~=2.1.0', 'matplotlib-base~=3.8.2', 'planetary-computer~=1.0.0', 'pytorch~=2.1.0', "pytorch[version='~=2.1.0',build=*cuda12*]", 'python~=3.11.0', 'rioxarray~=0.15.0', 'scikit-image~=0.22.0', 'scikit-learn~=1.4.0', 'stackstac~=0.5.0', 'torchdata~=0.7.1', 'transformers~=4.35.2', 'typeshed-client~=2.4.0', 'vit-pytorch~=1.6.4', 'wandb~=0.15.12', 'zarr~=2.16.1']

Could not solve for environment specs
The following package could not be installed
└─ pytorch ~=2.1.0 *cuda12* is not installable because it requires
   └─ __cuda, which is missing on the system.
time: 36.242
 ---> Removed intermediate container 05506e6a3c50
The command '/bin/sh -c TIMEFORMAT='time: %3R' bash -c 'time ${MAMBA_EXE} env update -p ${NB_PYTHON_PREFIX} --file "environment.yml" && time ${MAMBA_EXE} clean --all -f -y && ${MAMBA_EXE} list -p ${NB_PYTHON_PREFIX} '' returned a non-zero code: 1

For context, the binder button was added in #15 before we pinned to a specific CUDA version (done in 4d2d7c2b6c653fcd5452e38ee8b79f81174aab03/#37). We'll need to make the environment.yml file compatible with Binder again, probably by removing the *cuda* pin, but still ensure that developers training the model have the correct version of CUDA installed (probably with more documentation).

Alternatively, we could have the Binder build be dependent on Dockerfile - #166.

yeelauren commented 4 months ago

Hey there! I noticed this with the notebook buttons as well. I've seen other projects provide multiple environment.yml files, one for those with a local GPU and for those that would only use CPU. Often CPU may work for some folks but be slightly slower. I'm not certain if that's the case for Clay. But a thought on how one might structure the repo for different users.

yellowcap commented 3 months ago

This has been fixed by https://github.com/Clay-foundation/model/pull/273 using a custom environment.yml file in a .binder subfolder. Binder will use this in favor of the one in the main folder. In there we specified the cpuonly dependency, which limits to cpu install. The binder still takes a loooong time to build, but it works.

https://github.com/Clay-foundation/model/blob/main/.binder/environment.yml