coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Slow software environment builds #50

Closed mrocklin closed 1 year ago

mrocklin commented 4 years ago

There has been conversation about how to speed up our software environment builds. I decided to run a small experiment to see where time was being spent

Times show when the stage stopped.

  1. Start: 0:00
  2. Solving conda environment: 2:08, 2:08 duration
  3. conda env update in the Docker build process:
    • Start: 2.48 0:40 duration
    • Collecting Package metadata: 3:48 1:00 duration
    • Downloading and extracting packages: 5:30 (some uncertainty 2:00 duration
    • Verifying transaction: 6:00 0:30 duration
    • Cleanup: 7:06 1:05 duration
  4. Uploading Image: 7:30 0:25 duration

So it seems like, at least in this case, we're split between conda solve time (about three minutes total) and managing package time (also about three minutes total).

necaris commented 4 years ago

@mrocklin which environment were you building here?

mrocklin commented 4 years ago

Oh right, I meant to add that

import coiled
coiled.create_software_environment(name="test", conda={
  "channels": [
    "conda-forge",
    "defaults"
  ],
  "dependencies": [
    "bokeh>=2.1.1",
    "bottleneck",
    "dask-image>=0.3.0",
    "dask-ml>=1.5.0",
    "dask=2.23.0",
    "h5py",
    "lz4",
    "numba",
    "numpy>=1.19.0",
    "pandas>=1.1.0",
    "pillow>=7.2.0",
    "pip",
    "pyarrow>=0.15.1",
    "python-blosc",
    "python-graphviz",
    "python=3.8",
    "requests",
    "s3fs",
    "scikit-learn>=0.23.1",
    "prefect",
  ]
})
mrocklin commented 4 years ago

On my laptop this takes 40s rather than 2m. Maybe our node is just slow?

import conda.api

solver = conda.api.Solver("not-an-environment", channels=["conda-forge", "defaults"], subdirs=["linux-64", "noarch"], specs_to_add=[    "bokeh>=2.1.1",
    "bottleneck",
    "dask-image>=0.3.0",
    "dask-ml>=1.5.0",
    "dask=2.23.0",
    "h5py",
    "lz4",
    "numba",
    "numpy>=1.19.0",
    "pandas>=1.1.0",
    "pillow>=7.2.0",
    "pip",
    "pyarrow>=0.15.1",
    "python-blosc",
    "python-graphviz",
    "python=3.8",
    "requests",
    "s3fs",
    "scikit-learn>=0.23.1",
    "prefect",  ])
solver.solve_final_state()
shughes-uk commented 1 year ago

Built this with the new backend

1min 24s ± 6.06 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

I think we can close this 😁