conda-incubator / conda-store

Data science environments, for collaboration. ✨
https://conda.store
BSD 3-Clause "New" or "Revised" License
149 stars 50 forks source link

Investigate decoupling workers "work dir" from the environments dir #944

Closed soapy1 closed 2 weeks ago

soapy1 commented 3 weeks ago

Context

One of conda-store's main use cases is in nebari. In this implementation, conda-store is currently sharing it's volume with jupyterhub. This enables using conda-store environments in jupyter, which is good. But, doing the work of downloading/extracting/packaging up environments is IO intense. So, having "working directory" and the "environment directory" on the same volume (which is mounted by the workers and jupyter) leads to some performance issues. Another approach is to decouple these 2 and have them on separate volumes. So like, workers are mostly doing all their work on ephemeral volumes, except for installing environments, which they should do in their "environment directory" (which would have the shared environment volume mounted on it).

Value and/or benefit

Decoupling the working directory and the environment install directories:

Anything else?

No response

soapy1 commented 2 weeks ago

TL;DR

Looking into this I found, that conda store is only accessing the environment directory when it is installing the conda environment (and symlinkning the active environment). This is already the minimum amount of operations that can happen on the environment volume. So, separating out a work dir and environment dir is already happening. Where the workdir is just the local filesystem for the worker.

Another thing I found is that each worker has it's own package cache. There is maybe an opportunity to have workers share a package cache. But there will probably be more helpful gains in performance from resolving these issues.