Closed soapy1 closed 2 weeks ago
TL;DR
Looking into this I found, that conda store is only accessing the environment directory when it is installing the conda environment (and symlinkning the active environment). This is already the minimum amount of operations that can happen on the environment volume. So, separating out a work dir and environment dir is already happening. Where the workdir is just the local filesystem for the worker.
Another thing I found is that each worker has it's own package cache. There is maybe an opportunity to have workers share a package cache. But there will probably be more helpful gains in performance from resolving these issues.
Context
One of conda-store's main use cases is in nebari. In this implementation, conda-store is currently sharing it's volume with jupyterhub. This enables using conda-store environments in jupyter, which is good. But, doing the work of downloading/extracting/packaging up environments is IO intense. So, having "working directory" and the "environment directory" on the same volume (which is mounted by the workers and jupyter) leads to some performance issues. Another approach is to decouple these 2 and have them on separate volumes. So like, workers are mostly doing all their work on ephemeral volumes, except for installing environments, which they should do in their "environment directory" (which would have the shared environment volume mounted on it).
Value and/or benefit
Decoupling the working directory and the environment install directories:
Anything else?
No response