Closed ngam closed 2 years ago
If two python packages (pysr, xbitinfo) try to create two different environments and activate both julia environments, those two python packages are fundamentally not compatible with each other and we should disallow installing both python packages to the same conda environment.
I think you are beginning to the understand the developing situation now. Note that the julia_project
can be configured.
https://github.com/MilesCranmer/PySR/blob/d09ade8628f541eb94028009bacdb9a55cb22ef5/pysr/sr.py#L508-L512
During install()
it can also be defined as an argument.
https://github.com/MilesCranmer/PySR/blob/d09ade8628f541eb94028009bacdb9a55cb22ef5/pysr/julia_helpers.py#L9
Thus it's possible to have them in the same conda environment if we configure them correctly. The question is when can we configure them or do we let the end-user figure it out?
In the meantime, could you update the package on you have on your channel, @ngam before I start breaking things again?
done: https://anaconda.org/ngam/pysr/files
And introducing silently inconsistent conda envs is a blocker, so we have to ensure that's not the case or at least make the solver aware of conflicts somehow.
Just merged @mkitti's PR, so the latest PySR version (0.10.2) will now use JULIA_PROJECT
.
Folks, most of the conversation here is kind of above my head, sorry for not contributing more. However, we do have tons of nice discussions here and it would be nice to summary it, preferably as docs (even if in draft form), somewhere.
To clarify, what the PySR pull request allows us to do is tell PySR which Julia environment to find PyCall to start pyjulia as well as the rest of the components.
We can specify the julia_project
argument to the PySRRegressor
class or the julia_project
argument to install
. Previously, this would only affect the environment where PySR would locate SymbolicRegression.jl and ClusterManagers.jl. Now, it will also look for PyCall.jl in the same environment. This allows the PySR dependencies to be completely contained within a single environment rather than potentially being split among two environments.
Because of the stacking order defined in the julia-feedstock this automatically will fallback to looking for PyCall.jl in the default "base" Julia environment named after the conda environment.
For now this means the environment can be entirely contained with pysr-0.10.x
. For later this allows us to potentially redirect PySR to use a common environment with another package.
Folks, most of the conversation here is kind of above my head, sorry for not contributing more. However, we do have tons of nice discussions here and it would be nice to summary it, preferably as docs (even if in draft form), somewhere.
@ocefpaf, sorry. There is a lot going on here and a lot more to debate.
The main innovation here is that we can embed a Julia depot within the conda package which allows us to "install" Julia packages for a specific project in a predefined Julia environment. Essentially this allows us to move the [package].install()
step into the conda-forge build process.
The main question is how this will work when integrating multiple packages. For example, if someone wanted to install both xbitinfo
and pysr
in the same conda environment, how will that work?
If we figure out there is some utility to this approach, I'll send a pull request your way. It looks promising to me, but not everyone is convinced. For now, I need to unravel and simplify a few things in the feedstock here. Perhaps the main practical thing for you is examining the changes to the julia_helpers.py
in PySR: https://github.com/MilesCranmer/PySR/commit/b14e38ac6e7c719c26d1b936f2f960ab5363348f
In 8a4dcb3 I disabled manipulation of JULIA_LOAD_PATH
and JULIA_PROJECT
. However, I have not updated the PySR source.
My expectation is that this run will fail because Julia will not be able to locate PyCall.jl.
@ocefpaf, sorry. There is a lot going on here and a lot more to debate.
Don't be sorry. The discussion here is important! I'm sorry I don't have much to contribute to it.
@mkitti - sorry, it looks like 0.10.2 didn't actually include your change. Released 0.10.3 just now, which does.
@mkitti - sorry, it looks like 0.10.2 didn't actually include your change. Released 0.10.3 just now, which does.
Oh, hah, I thought you were waiting for me to test it before tagging the release.
I'm perplexed why I'm getting this error on osx only:
Traceback (most recent call last):
File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/pysr/julia_helpers.py", line 126, in init_julia
from julia import Main as _Main
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 672, in _load_unlocked
File "<frozen importlib._bootstrap>", line 632, in _load_backward_compatible
File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 247, in load_module
JuliaMainModule(self, fullname))
File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 149, in __init__
self._julia = loader.julia
File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 239, in julia
self.__class__.julia = julia = Julia()
File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 483, in __init__
raise UnsupportedPythonError(jlinfo)
julia.core.UnsupportedPythonError: It seems your Julia and PyJulia setup are not supported.
If line 126 throws an UnsupportedPythonError
, shouldn't line 129 catch it?
Yes, that error should be caught. Very very weird. Is there any way conda could be using a custom traceback mechanism here, that would ignore the try-except?
I figured it out. There was an additional error. I'm thinking the original error might have been that "some_env" did not resolve to a valid path on macos.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/pysr/julia_helpers.py", line 32, in install
Main = init_julia(julia_project)
File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/pysr/julia_helpers.py", line 133, in init_julia
jl = Julia(compiled_modules=False)
File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 513, in __init__
self._call("const PyCall = Base.require({0})".format(PYCALL_PKGID))
File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 549, in _call
self.check_exception(src)
File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 603, in check_exception
raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
julia.core.JuliaError: Exception 'ArgumentError' occurred while calling julia code:
const PyCall = Base.require(Base.PkgId(Base.UUID("438e738f-606a-5dbb-bf0a-cddfbfd45ab0"), "PyCall"))
Could it be two different pyjulia installations - one raises the error, the other is used to check it, hence the incompatibility with the except?
Ah, sorry, the thread didn’t update. Nice going!
@conda-forge-admin, please rerender
Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.
This message was generated by GitHub actions workflow run https://github.com/conda-forge/pysr-feedstock/actions/runs/3006779929.
I'm at a stopping point on this. This package functions much as it did before this pull request. It still uses a versioned Julia environment @pysr-0.10.4
by default.
julia_project
argument to pysr.install
and pysr.sr.init_julia
. It defaults to @psr-$version
.JULIA_DEPOT_PATH
with an activate script.@psr-$version
environment.pysr.install()
before using the package.Basically this conda package now installs the Julia components for you in a default Julia environment. For other Julia environments, pysr.install("/path/to/env")
can still be used to install the needed Julia packages. For example, the user might want to do pysr.install(os.environ["JULIA_PROJECT"])
. If one then specified the julia_project
argument to PySRRegressor
, then pysr can used along side other packages such as xbitinfo
.
import numpy as np
import os
X = 2 * np.random.randn(100, 5)
y = 2.5382 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 0.5
from pysr import PySRRegressor
model = PySRRegressor(
model_selection="best", # Result is mix of simplicity+accuracy
niterations=40,
binary_operators=["+", "*"],
unary_operators=[
"cos",
"exp",
"sin",
"inv(x) = 1/x",
# ^ Custom operator (julia syntax)
],
extra_sympy_mappings={"inv": lambda x: 1 / x},
# ^ Define operator for SymPy as well
loss="loss(x, y) = (x - y)^2",
# ^ Custom loss function (julia syntax)
# Uncomment the line below if installed into the default Julia environment
# julia_project=os.environ["JULIA_PROJECT"]),
)
# Previously this would complain about the user needing to invoke pysr.install()
# Now it just works. No `pysr.install()` needed
model.fit(X, y)
As far as I understand, we are discouraged from doing pysr.install(os.environ["JULIA_PROJECT"])
for the user via a post-link script. Thus, the user is responsible for completing that step if they want to integrate pysr with another Python package that also uses Julia such as xbitinfo
.
Using simple bash commands in a post-link script might allow us to parse the active Project.toml and insert a few lines.
The Project.toml for @pysr-$version
looks like this:
[deps]
ClusterManagers = "34f1f09b-3a8b-5176-ab39-66d58a4d544e"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
SymbolicRegression = "8254be44-1295-4e6a-a16d-46603ac705cb"
We might be able to echo the last few lines into the user's Project.toml for them via a simple bash script?
And introducing silently inconsistent conda envs is a blocker, so we have to ensure that's not the case or at least make the solver aware of conflicts somehow.
At the end of the day the user needs to do some final integration. In particular, they need to configure a common Julia environment if they want to combine different sets of Julia packages. By doing so, they will resolve any conflicts.
This situation was not introduced by this pull request. This pull request does make it easier to resolve by providing a Julia package cache that the combined Julia environment can draw upon. The packaged depot itself does not create any conflicts with other packages.
Thanks a lot for the great work and the time explaining to all of us the details.
Sounds good to me, but I disagree on the semantics of the last point. The difference is who introduces the inconsistency: us or the user? There are many ways users can mess up their environments, but currently conda-forge doesn't intentionally ship inconsistent environments.
No matter, I think to satisfy the main concern of isuruf, we need to think about a way to establish a conflict between two julia-packages depending on two explicitly different versions of the same third julia-package. This could be done manually, e.g. run_constrained
, but we should automate it if we really want this to be adopted.
For now, this conflict concern is not practical because we only have two packages around. So @isuruf is it okay to experiment with this for now and see how things go and think of a solution as we go along? How do people feel about this as a temporary measure?
I'm up for trying this out. I also don't see potential conflicts as an issue right now.
With the current approach before this PR, I don't think you could have both xbitinfo and pysr running at the same time anyways - whereas with this PR, you might be able to. At the very least, this lets us test this.
No matter, I think to satisfy the main concern of isuruf, we need to think about a way to establish a conflict between two julia-packages depending on two explicitly different versions of the same third julia-package. This could be done manually, e.g.
run_constrained
, but we should automate it if we really want this to be adopted.
To be clear, there is a solution on the Julia side between BitInformation.jl and SymbolicRegression.jl. The two can exist in the same Julia environment.
(jl_DhIDzb) pkg> add BitInformation
Resolving package versions...
Updating `/tmp/jl_DhIDzb/Project.toml`
[de688a37] + BitInformation v0.6.0
Updating `/tmp/jl_DhIDzb/Manifest.toml`
...
[34da2185] + Compat v4.2.0
...
[ffbed154] + DocStringExtensions v0.9.1
(jl_DhIDzb) pkg> add SymbolicRegression
Resolving package versions...
Updating `/tmp/jl_DhIDzb/Project.toml`
[8254be44] + SymbolicRegression v0.10.2
Updating `/tmp/jl_DhIDzb/Manifest.toml`
...
⌅ [34da2185] ↓ Compat v4.2.0 ⇒ v3.46.0
...
⌅ [ffbed154] ↓ DocStringExtensions v0.9.1 ⇒ v0.8.6
The problem is on the Python side between pysr and xbitinfo at the moment. The solution is that pysr and xbitinfo need to use the same Julia environment in order to be compatible, but currently default to using different environments.
pysr
defaults to using the Julia environment pysr-$version
xbitinfo
defaults to using the Julia environment xbitinfo-$version
Solutions:
$JULIA_PROJECT
environment by default.
The problem with this is that we do not know what $JULIA_PROJECT
is at build time, so the user has to manually install the Julia packages into $JULIA_PROJECT
.pysr-$version
and xbitinfo-$version
Julia environments to be compatible. Based on the above we just need to pin Compat.jl to v3.46.0 and DocStringExtensions.jl to v0.8.6. Since we are shipping the projects in a depot, we can do this at build time. The problem is the user could then modify the environments to be incompatible.@conda-forge
Julia environment by default. We then curate a common Julia environment for Python packages and ship that. The @conda-forge
environment can exist in the environment stack below the user's environments. This may be a fully concrete environment that we pre-build, or it could be an environment that we manage for the user.thank you for the explanation!!!
We are not trying to control the user, they're more than welcome to mess their envs.
I really like your option 3!
Except, we should patch/submit PR upstream, not just do it here. Remember, when we started this we were talking about a standard/policy for packages going forward. This is precisely the point. Now when we have newer packages in staged-recipes that are similar, we can work with maintainers on this before being added to conda-forge. We should work with staged-recipes team to make a subteam for julia-packages, and that definitely should include you.
I am fully supportive of this to merge and move on. Let's give ocefpaf and isuruf a bit more time to respond.
@conda-forge-admin, please restart ci
So to be clear, the temporary plan, as I understand it, is thus:
run_constrained
manuallySo to be clear, the temporary plan, as I understand it, is thus:
- be on the lookout for additional packages in staged-recipes, try to satisfy option 3 above
- if there is no way to satisfy option 3, then set a
run_constrained
manually
Yes. I admit to not fully understanding run_constrained
though.
To make sure to satisfy option 3, we probably need to do part of option 4 internally. That is we need to assemble a common Julia environment.
Additionally, we should prepare the Python code to make it easy to configure the default Julia environment they use. For example, I proposed a PYSR_PROJECT
environment variable that controls the default pysr project if it exists. This way, we could implement options 2 or 4 later as needed.
I don't have any objections. The needed actions are all on the xbitinfo-feedstock
side at the moment where we should pin Compat.jl and DocStringExtensions.jl. For some reason SymbolicRegressions.jl is not compatible with the latest versions of those. Perhaps we should investigate that later as well.
Basically all that we have done here is included a cache of Julia packages within the pysr conda-forge package, which happens to satisfy Julia's initial installation requirements. If someone really wanted to use the packages together, the easiest route would be to set the julia_project
of PySRRegressor
to @xbitinfo-{version}
and install SymbolicRegressions.jl
there.
This is also good for me. Let's give @isuruf a few days. Any major objections?
Incidentally, pythoncall is being added to conda-forge: https://github.com/conda-forge/staged-recipes/pull/20380 https://github.com/conda-forge/staged-recipes/pull/20379
Incidentally, pythoncall is being added to conda-forge:
Technically PythonCall.jl is a Julia package that calls Python code. It plays a similar role as PyCall.jl.
The two recipes linked above are actually the Python components. They are used to call Julia from Python, playing a similar roll as pyjulia.
The conda version of PySR a bit out-of-date so I'll plan to merge on Monday (afternoon eastern) unless @isuruf has any objections.
Best, Miles
Can someone summarize what this PR does?
Can someone summarize what this PR does?
See https://github.com/conda-forge/pysr-feedstock/pull/43#issuecomment-1239243732
We install a secondary Julia depot in $CONDA_PREFIX/share/pysr/depot
. This contains a package cache containing all of PySR's dependencies. It also contains a shared environment @pysr-0.10.4
that is ready to use by PySR by default without the user having to perform an explicit install step. The cache is calculated during build time allowing conda-forge the ability precisely control Julia package versions such as by pinning.
The cache can also be used by the user if they specify a julia_project
environment to pysr.install()
. Julia will not need to download additional packages if the versions contained in the package cache are correct. Othereise, they can be ignored. Importantly, the package cache does interfere with the user's package cache in $CONDA_PREFIX/share/julia/packages
nor does it interfere with potential other depots. It is simply a cache. This is because Julia packages are stored in hashed directories depending on their version and and git commit.
If the user wants to use other Julia packages either directly or via a Python package, they should explicitly specify an environment containing those other packages.
We can only supply a resolved Project.toml and Manifest.toml. We do have a mechanism to modify the Project.toml and Manifest.toml in the current Julia environment via post-link scripts. Thr required simplicity of what we can do in post-link prevents use from invoking the Julia package manager to do this for us.
In summary, we supply a Julia package cache and environment in a depot that makes PySR ready to use.
Can someone summarize what this PR does?
In very simple terms, it installs the julia artifacts into a predetermined location inside $PREFIX to get them packaged and uploaded to anaconda.org, so that pysr works without having to (re)install the julia stuff. For now, we are going to enforce conflicts issues manually
Resolving conflicts manually: whenever a package that does this sort of thing gets added to conda-forge, it will either have to not conflict with others or a run_constrained will be placed manually --- this is okay because we are just starting and we really only have one package so far following this new strategy, and it is this package. A long-term strategy is TBD
Why can this not be done in an activate script?
Why can this not be done in an activate script?
You want to run the Julia package manager every time someone activates an environment?
Perhaps you mean a post-link script?
In that case, I'm taking the advice here from here: https://docs.conda.io/projects/conda-build/en/latest/resources/link-scripts.html
Post-link and pre-unlink scripts should:
- Be avoided whenever possible.
- Not touch anything other than the files being installed.
- Not write anything to stdout or stderr, unless an error occurs.
- Not depend on any installed or to-be-installed conda packages.
- Depend only on simple system tools such as rm, cp, mv, and ln.
If this doesn't apply, we could certainly take that approach as we started to in #41.
You want to run the Julia package manager every time someone activates an environment?
If the package manager has not been run yet in that environment. Post-link scripts are for only one conda package and it doesn't give an environment overview.
What guarantees that Pkg will use bundled versions of the packages? If one of the dependencies gets updated in the future, then Julia will end up installing it from the internet instead of from the stacked depot, right? And if you're OK with that, then what was the point of bundling the dependencies in the first place?
Also how do you deal with packages that may have incompatible mutual dependencies? e.g. if you release packages A and B, which depend on package C, how do you ensure that C is compatible with both versions of A and B that are installed? The logical answer seems to be that you need C to be a Conda package too. The conclusion of which is that you need to release all Julia packages you depend on as individual Conda packages, if you want to ensure compatible versions. This seems like a huge undertaking, similar in scope to conda-forge itself.
Part of the point here is to deliver the PySR software as ready-to-use in a solution that has been fully tested by conda-forge. By invoking the Julia package manager during conda package build time we can control the versions included, pinning the dependencies as needed. The status quo that exists before this pull request is that the user has to take additional deliberate steps to make this package function. As such, Julia package management is lies completely out side of conda-forge's control. The specific configuration that the user ends up with may never has been tested together.
I concur that packaging individual Julia packages would be the ideal. To support PySR, this would require adding ~100 individual packages. That's a tremendous barrier to starting though. What I'm trying to sort out here is a way to get started while delivering a practical solution.
What guarantees that Pkg will use bundled versions of the packages? If one of the dependencies gets updated in the future, then Julia will end up installing it from the internet instead of from the stacked depot, right? And if you're OK with that, then what was the point of bundling the dependencies in the first place?
We package a complete environment with a Project.toml and Manifest.toml, so that specifies the exact package versions to use. That complete environment is the default used by PySR and it is "installed" after conda finishes. As time goes on this package will be rebuilt with new Julia dependencies. The point of bundling the dependencies in the first place is that conda and conda-forge are now managing those files. The bundled solution has been tested as part of the CI here. There is some appeal in that. In my understanding, this is what conda-forge is about and what separates from just using conda and any other channel. conda-forge aims to deliver a completely tested solution built from source.
Also how do you deal with packages that may have incompatible mutual dependencies? e.g. if you release packages A and B, which depend on package C, how do you ensure that C is compatible with both versions of A and B that are installed? The logical answer seems to be that you need C to be a Conda package too. The conclusion of which is that you need to release all Julia packages you depend on as individual Conda packages, if you want to ensure compatible versions. This seems like a huge undertaking, similar in scope to conda-forge itself.
I discussed this above. We are currently addressing a N = 2
situation. The other conda package is https://github.com/conda-forge/xbitinfo-feedstock . By bundling the packages at build time, we have a chance to reconcile the package versions. I've done the analysis in this case and we only need to pin two package versions, Compat.jl and DocStringExtensions.jl. All other packages are compatible at the latest versions.
The next targets for packaging are those at the intersection of these two packages.
Let's consider another question for a second. Why should conda-forge package pure Python or R packages? Why not leave that to pip or poetry? I think this is exactly analogous to the Julia package situation. If I can just use the Julia package manager in an activate script, why not use pip
in an activate script? That might work in another conda channel, but my understanding is that is not how conda-forge works. Maybe I'm wrong.
@ngam I bumped this to version v0.11.0 and build 1, anticipating that #47 will be merged before this one. Please update and resolve conflicts as needed.
Should we rebase and start a new PR, @isuruf and @cjdoris?
I just want to add one point to the discussion. In my mind, a super important part of conda-forge is relocatability. Let me illustrate this point. Both @MilesCranmer and I are actually affiliated with the same academic institution where our HPC compute nodes have no internet access (login nodes do). Say I want to use this great and powerful package (pysr). I get it from conda-forge on my login node and then I get an allocation to run some interesting code on the compute nodes (our HPC uses slurm). But, it will give me a silent error about not having the necessary deps, therefore wasting my time and causing confusion (now, think of an average user who thinks it will just work, not us maintainers and developers).
For me personally, this is my main motivation here. Every time conda create -n some_env some_pkg ...
gets triggered, I want some_env
to be able to just work without any further modification. Not only that, I also want it to work across login nodes and compute nodes. I think this is one of the greatest and best things about conda-forge. Without this PR, this package (and others like it) require the user to do an extra step. I simply don't think that extra step should exist.
If I understand correctly, the user can simply do another round of pysr.install()
if they want right? Or they can just do whatever they want using Pkg
as well. Here, we simply want to get the user something that just works out-of-the-box because we could and should.
Edit: this relocatability argument extends to making containers as well.
Why can this not be done in an activate script?
An activate script only moves the problem around by a tiny bit and doesn't seem to be recommended in this case; moreover, see my long-running obsession with login vs compute nodes: https://github.com/conda-forge/pysr-feedstock/pull/43#issuecomment-1244804214
Let's consider another question for a second. Why should conda-forge package pure Python or R packages? Why not leave that to pip or poetry? I think this is exactly analogous to the Julia package situation. If I can just use the Julia package manager in an activate script, why not use
pip
in an activate script? That might work in another conda channel, but my understanding is that is not how conda-forge works. Maybe I'm wrong.
Exactly. We have really good tools and we should use them to great effect. Otherwise, just use pip or clone the repo, etc. --- we are not just another pip because there is no need; pip is great at its thing
To clear up any confusion, I am already pro this proposal. Having one package manager be able to resolve both Python and Julia (and other) dependencies for us sounds great. I'm just trying to figure out how it works in practice and if it is scalable to many packages.
We package a complete environment with a Project.toml and Manifest.toml, so that specifies the exact package versions to use. That complete environment is the default used by PySR and it is "installed" after conda finishes. As time goes on this package will be rebuilt with new Julia dependencies.
Is this literally the Manifest created when you build PySR? What happens when we use Conda to install another Julia package with its own Manifest? It's not possible to instantiate two Manifests, do you merge them somehow?
A few smaller questions:
Thinking out loud, just as a point of comparison, there is also my JuliaPkg Python package for managing Julia dependencies. This is the default way that JuliaCall manages its dependencies. It basically provides a way for a Python package to declare any Julia dependencies it needs, and then JuliaPkg will ensure those dependencies are met.
Some pros of JuliaPkg compared to the conda-forge proposal:
Some cons:
juliapkg-SymbolicRegression
which just declare a JuliaPkg dependency on SymbolicRegression at a particular version. Then the Python package manager can ensure all the Julia packages are compatible too. This is a similar complexity of effort to creating a Conda package for every Julia package.juliapkg.resolve()
) to resolve/install Julia dependencies, whereas the conda-forge way has everything already installed.So if we instead just used JuliaPkg everywhere, with the addition of juliapkg-X
Python packages for every Julia package X to declare a dependency on X at a specific version, then we get something very similar to the current proposal. This has one key down-side, namely there is a post-install step (juliapkg.resolve()
) required before severing the internet connection. But a large up-side is that this works in any Python package manager (Pip/Poetry/Python/etc).
@conda-forge-admin, please rerender
Is this literally the Manifest created when you build PySR? What happens when we use Conda to install another Julia package with its own Manifest? It's not possible to instantiate two Manifests, do you merge them somehow?
Yes, this is the manifest we used to build. We put it in a dedicated environment called @pysr-{version}
which is the default environment that PySR will try to use. The other Python package also creates it's own environment. Thus we don't actually merge them on the user's machine.
The current plan is to reconcile the two environments by pinning packages while building the conda-forge packages such that the Julia packages are at the same version. At the moment, this is only requires pinning two packages.
- As discussed, all Julia dependencies will ultimately need to be added, namely 100s or 1000s of packages. Are there plans/ambitions to automate this, otherwise it is a huge undertaking.
Ultimately, I think automation is the key. conda-forge seems to have a tradition of automating as much as possible with Github bots, so there is precedence. I'm just not sure if conda-forge is ready for me to stage 100 packages at once. That's why I'm taking a more evolutionary approach where we start with a small number of packages with larger bundles and then focus on the intersection of the current packages. PyCall.jl and PythonCall.jl would definitely be one of the first Julia packages I think.
- Is there a reason you don't install the packages/artifacts into the existing depot? The Julia distro in conda-forge already creates its own depot in the Conda environment, and these packages/artifacts will have distinct paths, so this should work fine I believe. In a world where every Julia package is a Conda package, we probably don't want 100s of depots stacked up. Thinking about it, separate depots is probably necessary in the current setup just because there are a few common dependencies, so there would indeed be a few file clashes, but this problem would go away in a world where every Julia package is a Conda package.
The existing depot, which we moved from ~/.julia
to $CONDA_PREFIX/share/julia
is still under the user's control. Given the lack of Julia packages in conda-forge, I would expect the user to have installed some Julia packages directly rather than through conda. Rather than deal with a potential collision of files, Julia's depot stacking allows us to bypass this issue entirely. Separation of "user" and "system" installed packages is the intended use case of depot stacking. In this case, conda-forge is playing the role of the "system".
Earlier I had proposed moving towards a common conda-forge
managed depot, but this requires us to eliminate the overlap in packages. In principle this is possible, but it will require additional conda packages to implement.
Thinking out loud, just as a point of comparison, there is also my JuliaPkg Python package for managing Julia dependencies. This is the default way that JuliaCall manages its dependencies. It basically provides a way for a Python package to declare any Julia dependencies it needs, and then JuliaPkg will ensure those dependencies are met.
This is a nice approach. Conda-forge could use JuliaPkg at build time to deliver the initial package state. Thus, JuliaPkg is complementary to the approach taken here.
Thanks this is all really helpful.
The current plan is to reconcile the two environments by pinning packages while building the conda-forge packages such that the Julia packages are at the same version. At the moment, this is only requires pinning two packages.
OK so you install one environment per Conda package, giving you a consistent set of Julia packages. Is there a mechanism to merge these into a single environment, so that it's possible to access a bunch of unrelated Julia packages in a single session? I think maybe that's what you meant in the quote above but I'm not sure. How does it work?
fixes #38
Checklist
0
(if the version changed)conda-smithy
(Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)