Closed murphyk closed 1 year ago
Hello @murphyk thank you for your question. The examples notebook are a good starting place to see how you can use Coiled.
If you plan on working on a variation of an example notebook (or start from scratch), we would recommend you to create a notebook with the coiled.create_notebook
command. Using that command allows you to specify any file(s) you might need on your coiled notebook.
In this snippet I am using the dependencies that the hyperband-optimization
notebook uses. Feel free to modify however it might suit your needs.
import coiled
coiled.create_notebook(
name="pytorch-finetuning",
conda={
"channels": ["conda-forge", "pytorch", "defaults"],
"dependencies": ["coiled=0.0.36", "dask-ml", "dask>=2.29.0", "matplotlib", "numpy", "pandas>=1.1.0", "python=3.8", "pytorch>1.1.0", "s3fs", "scipy", "skorch"]
},
files=["..."]
)
When dealing with a zip file like the hymenoptera_data.zip
there are two ways:
patool
)Regarding your question of how to read data from different sources, Dask can read from various remote data locations including HTTP. If it's impractical to upload those files to the notebook, you could open a local notebook, import coiled and run the computations on coiled instead of using a local cluster.
I hope this helps you
Thanks, I'll take a look. Meanwhile I tried to repeat the above experiment using this colab. This downloads the data into the colab VM and starts a coiled cluster for fask. Interestingly, when I run the code, it seems to find the files, but gives a different error:
distributed.protocol.pickle - INFO - Failed to deserialize b"\x80\x05\x95O\x03\x00\x00\x00\x00\x00\x00\x8c\x16tblib.pickling_support\x94\x8c\x12unpickle_exception\x94\x93\x94(\x8c\x08builtins\x94\x8c\tTypeError\x94\x93\x94\x8c'an integer is required (got type bytes)\x94\x85\x94Nh\x00\x8c\x12unpickle_traceback\x94\x93\x94\x8c\x05tblib\x94\x8c\x05Frame\x94\x93\x94)\x81\x94}\x94(\x8c\tf_globals\x94}\x94(\x8c\x08__name__\x94\x8c\x12distributed.worker\x94\x8c\x08__file__\x94\x8cH/opt/conda/envs/coiled/lib/python3.8/site-packages/distributed/worker.py\x94u\x8c\x06f_code\x94h\n\x8c\x04Code\x94\x93\x94)\x81\x94}\x94(\x8c\x0bco_filename\x94h\x14\x8c\x07co_name\x94\x8c\x10ensure_computing\x94ububM\xfe\th\n\x8c\tTraceback\x94\x93\x94)\x81\x94}\x94(\x8c\x08tb_frame\x94h\x0c)\x81\x94}\x94(h\x0f}\x94(h\x11h\x12h\x13h\x14uh\x15h\x17)\x81\x94}\x94(h\x1ah\x14h\x1b\x8c\x17_maybe_deserialize_task\x94ubub\x8c\ttb_lineno\x94M\xcc\t\x8c\x07tb_next\x94h\x1e)\x81\x94}\x94(h!h\x0c)\x81\x94}\x94(h\x0f}\x94(h\x11h\x12h\x13h\x14uh\x15h\x17)\x81\x94}\x94(h\x1ah\x14h\x1b\x8c\x0c_deserialize\x94ububh(M\x1d\rh)h\x1e)\x81\x94}\x94(h!h\x0c)\x81\x94}\x94(h\x0f}\x94(h\x11h\x12h\x13h\x14uh\x15h\x17)\x81\x94}\x94(h\x1ah\x14h\x1b\x8c\x0eloads_function\x94ububh(M\x14\rh)h\x1e)\x81\x94}\x94(h!h\x0c)\x81\x94}\x94(h\x0f}\x94(h\x11\x8c\x1bdistributed.protocol.pickle\x94h\x13\x8cQ/opt/conda/envs/coiled/lib/python3.8/site-packages/distributed/protocol/pickle.py\x94uh\x15h\x17)\x81\x94}\x94(h\x1ah@h\x1b\x8c\x05loads\x94ububh(KKubububub\x87\x94R\x94t\x94R\x94."
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/distributed/protocol/pickle.py", line 75, in loads
return pickle.loads(x)
ValueError: unsupported pickle protocol: 5
distributed.protocol.core - CRITICAL - Failed to deserialize
...
I just had a look and it seems that the exception that you have seen ValueError: unsupported pickle protocol: 5
refers to a mismatch of python versions. That notebook is using python 3.7 whilst the example notebook is running pyhton 3.8 that could be the reason why.
Perhaps if it's not much of a hassle, you could try to recreate that notebook locally or on a coiled notebook - that way the versions will match and things should run more smoothly.
I would rather launch stuff from colab than from my laptop (since a single colab often suffices, but sometimes I want to run things in parallel). So I made a software environment containing python3.7 to match colab:
env = coiled.create_software_environment(
name="pytorch-finetuning-37",
conda={
"channels": ["conda-forge", "pytorch", "defaults"],
"dependencies": ["coiled=0.0.36", "dask-ml", "dask>=2.29.0", "matplotlib", "numpy", "pandas>=1.1.0",
"python=3.7", # match colab
"pytorch>1.1.0", "s3fs", "scipy", "skorch"]
}
)
I then create the cluser
cluster = coiled.Cluster(
n_workers=2, #10 # use 2 to make startup time faster
name = "pytorch-finetuning-37",
software = "pytorch-finetuning-37"
)
and run some code
dmodel = // delayed model
batches = /// delayed version of local file loading
predictions = [predict(batch, dmodel) for batch in batches]
predictions = dask.compute(*predictions)
Now the error I get is the same as when I run your notebook on coiled, namely it cannot find the files, since they are local (in this case, to colab):
FileNotFoundError: [Errno 2] No such file or directory: 'hymenoptera_data/val/bees/2104135106_a65eede1de.jpg'
Source code: https://github.com/probml/pyprobml/blob/master/notebooks/coiled_pytorch_finetune.ipynb
Are you able to send the file using the distributed client?
from distributed import Client
client = Client(cluster)
client.upload_file(<zip file>)
Package sync should resolve these issues. Closing as stale as we have not heard from the user in quite some time
I am trying to modify the dask pytorch finetuning example so that it runs on a coiled client. My modified code is here. The script downloads the data locally using
Not surprisingly, when I run
predictions = dask.compute(*predictions)
, it fails to find the data, sayingIt is very unclear how a remote task can access this kind of image data - does it need to be downloaded into some kind of dask format? How does that work for a set of images stored in a zip file? What if the data is stored at https://www.tensorflow.org/datasets or https://pytorch.org/vision/0.8/datasets.html? How can we work with such data?