coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Software Env Created but does not have all the dependencies #170

Closed nilanjanroy1 closed 2 years ago

nilanjanroy1 commented 2 years ago

Hi Team, I have created a software environment with the required below dependencies.

coiled.create software_environment(
    name="gpu-test-ml4",
    container="gpuci/miniconda-cuda:11.2-runtime-ubuntu20.04",
    conda={
        "channels": [
            "rapidsai",
            "conda-forge",
            "defaults",
        ],
        "dependencies": [
            "dask",
            "dask-cuda",
            "dask-cudf",
            "cupy",
            "s3fs",
            "cudf",
            "pyarrow",
            "cudatoolkit=11.2",
            "mlforecast"
        ]
    }
)

The environment is getting created, but while trying the below piece of code I am getting error as cudf/ dask-cudf not found.

import dask_cudf
import cudf

def test_func():
    df = cudf.DataFrame({
        'a': list(range(200)),
        'b': list(reversed(range (200))),
        'c': list(range(200))
    })
    ddf = dask_cudf.from_cudf(df, partitions=2)
    ddf.to_parquet('s3://nilanjan-test/out-coiled test/cudf/')

f = client.submit(test_func)
f.result()

The sample piece which is in website is working

import numpy as np
import cupy as cp

def test_gpu():
    x = cp.arange(25).reshape(5, 5).astype("f")
    return cp.asnumpy(x.sum())

f = client.submit(test_gpu)
f.result()

So, cupy got installed but cudf and dask-cudf didnt get installed. It seems there might a version mismatch and all dependencies are not getting installed. Can someone suggest what might be the issue here. Thanks in advance.

phobson commented 2 years ago

Thanks for raising this issue. I hope you don't mind that I modified the issue to use text instead of the images. The issue that images can't be search in the search bar, people using screen readers to interact with their computer can't interpret them, and text makes it easier for an engineer who might be able to solve your issue to copy & paste your code into a script or notebook to test it out.

phobson commented 2 years ago

I have a couple of questions:

  1. Could you include a full traceback of the error that you saw
  2. I assume you have these libraries installed locally. Is that the case?
  3. Can you show how you created your cluster and client?
nilanjanroy1 commented 2 years ago

Hi @phobson , Thanks for checking this out. I was trying to recreate the same, but today i was able to import cudf & dask_cudf as in the below code(using the same software env build last Friday)

def test_func(): import dask_cudf import cudf

df = cudf.DataFrame({'a': list(range(2000000)),
                 'b': list(reversed(range(2000000))),
                 'c': list(range(2000000))
                })

ddf = dask_cudf.from_cudf(df, npartitions=2) 
ddf.to_parquet('s3://nilanjan-test/out-coiled_test/df_2000000/')

f = client.submit(test_func) f.result()

  1. Yesterday, i was getting import error i.e module not found(cudf & dask_cudf). Sorry I don't have the exact error message saved.
  2. No the libraries were not installed locally.
  3. Created my cluster and client as below: cluster = coiled.Cluster(worker_gpu=1, worker_vm_types=['g4dn.xlarge'],software="gpu-test-ml4", account="nilanjan_roy1") client = Client(cluster)

I tried the same steps today, and it ran fine. Not sure what is causing the inconsistency.

phobson commented 2 years ago

I'm not familiar with the nuances of working with GPUs in general or dask_cudf + cupy in particular, but I think you going to need to install the libraries locally.

A full traceback would be very helpful here.

nilanjanroy1 commented 2 years ago

@phobson sure, I will provide the full traceback if I receive the error next time. can we keep this issue for next few days, in case I encounter the same issue. Thanks in advance.

hayesgb commented 2 years ago

@nilanjanroy1 -- Following up on this issue. Have you encountered it again, or are you good with closing?

nilanjanroy1 commented 2 years ago

Hi @hayesgb , I will have the ticket closed. I haven't faced the same issue again. Thanks all for checking the issue out.