JuliaPy / CondaPkg.jl

Add Conda dependencies to your Julia project
MIT License
119 stars 13 forks source link

CondaPkg + PythonCall does not behave nicely on read-only filesystems #142

Open kleinschmidt opened 1 month ago

kleinschmidt commented 1 month ago

We use read-only filesystems in our docker containers deployed in k8s as a security measure (required by internal policies). I think this would probalby be fine on its own (we bake our python dependencies in at build time), but when combined with PythonCall we run into trouble. I think the root of it is that PythonCall calls envdir to get the location of the CondaPkg-managed executable, here:

https://github.com/JuliaPy/PythonCall.jl/blob/379f16c43933b5a7eed505adcdb70138a09c6b34/src/C/context.jl#L53-L71

That in turn calls resolve (not mentioned in the docstring!) which in turn creates a pidlock file with no way to disable it or control where it gets written (other than always including CondaPkg as a top-level dependency in every project we containerize this way, even if it's just a PythonCall-using package many layers deep in the stack that needs it):

https://github.com/JuliaPy/CondaPkg.jl/blob/0c84aac0db2797b15e1a75ab7aa54f14dc6b9dd4/src/resolve.jl#L527-L532

I'd hoped that setting offline mode would disable this kinda stuff but, no dice...that check doesn't get tripped until after the lockfile has been acquired:

https://github.com/JuliaPy/CondaPkg.jl/blob/0c84aac0db2797b15e1a75ab7aa54f14dc6b9dd4/src/resolve.jl#L582

I'm not totally sure this is an issue with CondaPkg per se, but I can think of a few things that CondaPkg might be able to do to play more nicely with read-only filesystems.

  1. Allow the meta_dir location to be controlled by a preference (then you could use a writeable volume mount in k8s)
  2. Disable the pidlock file in offline mode if no writes are going to take place
  3. Refactor envdir + resolve to not call resolve directly but instead update the environment information/state directly.

I'll also note that looking at the code for activate! (which PythonCall also calls using the CondaPkg backend), envdir is called again.

EDIT: I just noticed the STATE.frozen check that provides a bail out as well. I think that might provide some help as well, but envdir would still need to have some mechanism for auto-detecting the environment... https://github.com/JuliaPy/CondaPkg.jl/issues/115

cjdoris commented 1 month ago

You can already do something similar to Suggestion 1 with the env preference, which controls where the Conda environment gets put. However it does not change where the meta_dir itself is put - we could add a similar option for that too.

I'd like to extend both options to treat # specially similar to LOAD_PATH - e.g. so that JULIA_CONDAPKG_ENV=@# would expand to ~/.julia/conda_environments/{hash} where {hash} is a unique hash of the full path of the Julia project we're in (top_env in the code). This would mean we still get separate Conda envs for each Julia project without having to store them within the project itself. (pypoetry does a very similar thing when creating virtual environments.)

cjdoris commented 1 month ago

For Suggestion 2 - I'm reluctant to disable the pidlock in offline mode, because some other process might be running in online mode and write to it. However we could add an option to explicitly disable it that you could use in tandem with offline mode.

cjdoris commented 1 month ago

For Suggestion 3 - envdir needs to call resolve because it guarantees that the env is resolved when it returns. However I think the other suggestions should be sufficient to solve your issue.