conda-incubator / conda-store

Data science environments, for collaboration. ✨
https://conda.store
BSD 3-Clause "New" or "Revised" License
145 stars 50 forks source link

[ENH] - Investigate conda-store performance #645

Closed nkaretnikov closed 1 month ago

nkaretnikov commented 1 year ago

Feature description

See this talk from PackagingCon 2023: https://cfp.packaging-con.org/2023/talk/VZUZ9Y/ Gotta Go Fast Kat Marchán

Slides: https://github.com/zkat/presentations/blob/881f8d085d30b1d6c89b666606570fa1a8d1a99f/presentation.md

Value and/or benefit

The talk is packed with tips on improving package manager perf. We need to look at the slides and see what we can improve in conda-store.

Anything else?

Just an idea to explore, not an immediate call to action.

trallard commented 4 months ago

Relevant: https://github.com/conda-incubator/conda-store/pull/840

trallard commented 1 month ago

Assigning to @peytondmurray since he is looking into perf

peytondmurray commented 1 month ago

I spent some time profiling environment creation - that's sending a POST request to /specification/. For my profiling, I used the following environment:

channels:
  - conda-forge
  - bokeh
dependencies:
  - python=3.10
  - panel
  - ipykernel
  - ipywidgets
  - ipywidgets_bokeh
  - holoviews
  - openjdk=17.0.9
  - pyspark
  - findspark
  - jhsingle-native-proxy>=0.8.2
  - bokeh-root-cmd>=0.1.2
  - nbconvert
  - pip:
      - nrtk==0.3.0
      - xaitk-saliency==0.7.0
      - maite==0.5.0
      - daml==0.44.5
      - hypothesis >=6.61.0,<7.0.0
      - pytest >=7.2.0,<8.0
      - pytest-cov >=4.0.0,<5.0
      - pytest-mock >= 3.10.0,<4.0
      - pytest-snapshot >= 0.9.0
      - pytest-xdist >=3.3.1,<4.0.0
      - types-python-dateutil >=2.8.19,<3.0.0
      - tox >=4.6.4,<5.0.0
      - virtualenv-pyenv >=0.3.0,<1.0.0
      - jupytext >= 1.14.0
      - numpydoc >= 1.5.0
      - pyright >= 1.1.280
      - loguru
      - torch>=2.1
      - torchmetrics
      - torchvision
      - multiprocess
      - keras
      - yolov5

Using conda-store via docker compose up --build, the build took 27 minutes to finish on my local machine. For comparison, installing via conda directly (not through conda-store) took ~5 minutes, and running conda-lock took ~10 minutes.

Using pyinstrument I was able to get both server and worker profiles. The server spent almost no time at all dispatching the worker to build the environment. There's no bottleneck here.

The worker on the other hand has two major problematic parts:

These findings have informed future efforts that will be spent on conda-store, in addition to conda itself.

peytondmurray commented 1 month ago

Closing as completed now that we have planned action items that will address performance issues.