coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Software environments with private pip packages don't have correct conda environment activated #105

Closed jrbourbeau closed 3 years ago

jrbourbeau commented 3 years ago

Recently we updated our software environment building process to install packages into a coiled conda environment (instead of the base conda environment). Everything works fine in the common case where we create a software environment with public conda / pip packages (see the example below).

Details: ```python In [1]: import coiled In [2]: coiled.__version__ Out[2]: '0.0.35' In [3]: coiled.create_software_environment(name="test", conda=["dask"]) Updating software environment... Solving conda environment... Conda environment solved! Building Docker image (this takes a few minutes) STEP 1: FROM coiled/default:sha-9aa53a2 STEP 2: COPY environment.yml environment.yml --> Using cache 053aec1d1acc0f309bf8ff9903298c4d3d84daa4d116efe5f39919b6e8e246e1 --> 053aec1d1ac STEP 3: RUN conda env update -n coiled -f environment.yml && rm environment.yml && conda clean --all -y && echo "conda activate coiled" >> ~/.bashrc --> Using cache 79ac3fbd72c763da7e7608bc4d93eb1e9befde6da7e3678b99e08fa9dad60938 --> 79ac3fbd72c STEP 4: ENV PATH /opt/conda/envs/coiled/bin:$PATH --> Using cache 1ea458694f515f98e31a692ed84fb1f0cce4a09cfaf285598b2555e1b15df00d --> 1ea458694f5 STEP 5: SHELL ["conda", "run", "-n", "coiled", "/bin/bash", "-c"] --> Using cache a6d1085e458bfd5c80d82e7056aa2967019750884e53aebb28ebc2b07afa4149 STEP 6: COMMIT 2071e20e-b35d-4a67-939a-544fb87e9ba3 --> a6d1085e458 a6d1085e458bfd5c80d82e7056aa2967019750884e53aebb28ebc2b07afa4149 Docker build succeeded: 2071e20e-b35d-4a67-939a-544fb87e9ba3 Uploading image Getting image source signatures Copying blob sha256:95d40c352268822ef7c75a1af5a2d4ff27ffb7fa22c986eaf13836e50bb63cbf Copying blob sha256:b093dde676605ea0bb6441533cc0a26ae711549b8b137eab0486b7af2ae9bea3 Copying blob sha256:73571478965aec06a47d046298c659c0829276bb627684ad4dcb56547f09b619 Copying blob sha256:dfef8986f350d2efb7dd633410dceac30543be620085d9819f42db7067f55f64 Copying blob sha256:f5600c6330da7bb112776ba067a32a9c20842d6ecc8ee3289f1a713b644092f8 Copying blob sha256:0553ab4c463e8dd22931a5deb37e8014a18cde60d6be1337f4415de56649a947 Copying config sha256:a6d1085e458bfd5c80d82e7056aa2967019750884e53aebb28ebc2b07afa4149 Writing manifest to image destination Storing signatures Finished updating environment In [4]: cluster = coiled.Cluster(software="test") Creating Cluster. This takes about a minute ...Checking environment images Valid environment image found In [5]: from dask.distributed import Client In [6]: client = Client(cluster) /Users/james/projects/dask/distributed/distributed/client.py:1136: VersionMismatchWarning: Mismatched versions found +---------+---------------+---------------+---------------+ | Package | client | scheduler | workers | +---------+---------------+---------------+---------------+ | blosc | 1.10.1 | None | None | | lz4 | 3.1.1 | None | None | | msgpack | 1.0.0 | 1.0.1 | 1.0.1 | | numpy | 1.19.4 | 1.19.2 | 1.19.2 | | python | 3.8.6.final.0 | 3.8.5.final.0 | 3.8.5.final.0 | | toolz | 0.10.0 | 0.11.1 | 0.11.1 | +---------+---------------+---------------+---------------+ Notes: - msgpack: Variation is ok, as long as everything is above 0.6 warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"])) In [7]: client.submit(lambda x: x + 1, 123).result() Out[7]: 124 ```

However, if one uses a pip package from a private git repository

import coiled

coiled.create_software_environment(
    name="test",
    pip=["dask[complete] @ git+https://GIT_TOKEN@github.com/dask/dask.git"],
)
cluster = coiled.Cluster(software="test")

then when we go to create a cluster which uses this software environment the scheduler and workers fail to start with

/opt/conda/bin/python: Error while finding module specification for 'distributed.cli.dask_spec' (ModuleNotFoundError: No module named 'distributed')

as it appears the coiled conda environment (where all these packages are installed) isn't activated and the original base environment is still being used.

cc @FabioRosado @sandhujasmine if either of you get a moment to look at this

sandhujasmine commented 3 years ago

@jrbourbeau - I tested this in sandbox and beta and it works for me if I create a new software env and a subsequent cluster.

I also created a software env by pointing to the container you pushed up above and that does give me the error you reported so the image generated above does have an issue. I don't know the problem but to me it seems confined to that particular image as opposed to the process of software environment creation. I've also pulled the images locally and the only difference I see thus far is the version of dask since I created it today.

Could you try again to see if you can consistently reproduce it? Or @FabioRosado if you could try to reproduce?

jrbourbeau commented 3 years ago

@jose-moralez just a heads up that @FabioRosado has fixed this issue internally and we'll close this issue out once the fix had been deployed

jrbourbeau commented 3 years ago

Closing as we've pushed out the fix -- you should be good to go @jose-moralez