ngam commented 2 years ago

fixes #38

Checklist

[ ] Used a personal fork of the feedstock to propose changes
[ ] Bumped the build number (if the version is unchanged)
[ ] Reset the build number to 0 (if the version changed)
[ ] Re-rendered with the latest conda-smithy (Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
[ ] Ensured the license file is being packaged.

mkitti commented 2 years ago

If two python packages (pysr, xbitinfo) try to create two different environments and activate both julia environments, those two python packages are fundamentally not compatible with each other and we should disallow installing both python packages to the same conda environment.

I think you are beginning to the understand the developing situation now. Note that the julia_project can be configured. https://github.com/MilesCranmer/PySR/blob/d09ade8628f541eb94028009bacdb9a55cb22ef5/pysr/sr.py#L508-L512

During install() it can also be defined as an argument. https://github.com/MilesCranmer/PySR/blob/d09ade8628f541eb94028009bacdb9a55cb22ef5/pysr/julia_helpers.py#L9

Thus it's possible to have them in the same conda environment if we configure them correctly. The question is when can we configure them or do we let the end-user figure it out?

ngam commented 2 years ago

In the meantime, could you update the package on you have on your channel, @ngam before I start breaking things again?

done: https://anaconda.org/ngam/pysr/files

And introducing silently inconsistent conda envs is a blocker, so we have to ensure that's not the case or at least make the solver aware of conflicts somehow.

MilesCranmer commented 2 years ago

Just merged @mkitti's PR, so the latest PySR version (0.10.2) will now use JULIA_PROJECT.

ocefpaf commented 2 years ago

Folks, most of the conversation here is kind of above my head, sorry for not contributing more. However, we do have tons of nice discussions here and it would be nice to summary it, preferably as docs (even if in draft form), somewhere.

mkitti commented 2 years ago

To clarify, what the PySR pull request allows us to do is tell PySR which Julia environment to find PyCall to start pyjulia as well as the rest of the components.

We can specify the julia_project argument to the PySRRegressor class or the julia_project argument to install. Previously, this would only affect the environment where PySR would locate SymbolicRegression.jl and ClusterManagers.jl. Now, it will also look for PyCall.jl in the same environment. This allows the PySR dependencies to be completely contained within a single environment rather than potentially being split among two environments.

Because of the stacking order defined in the julia-feedstock this automatically will fallback to looking for PyCall.jl in the default "base" Julia environment named after the conda environment.

For now this means the environment can be entirely contained with pysr-0.10.x. For later this allows us to potentially redirect PySR to use a common environment with another package.

mkitti commented 2 years ago

Folks, most of the conversation here is kind of above my head, sorry for not contributing more. However, we do have tons of nice discussions here and it would be nice to summary it, preferably as docs (even if in draft form), somewhere.

@ocefpaf, sorry. There is a lot going on here and a lot more to debate.

The main innovation here is that we can embed a Julia depot within the conda package which allows us to "install" Julia packages for a specific project in a predefined Julia environment. Essentially this allows us to move the [package].install() step into the conda-forge build process.

The main question is how this will work when integrating multiple packages. For example, if someone wanted to install both xbitinfo and pysr in the same conda environment, how will that work?

If we figure out there is some utility to this approach, I'll send a pull request your way. It looks promising to me, but not everyone is convinced. For now, I need to unravel and simplify a few things in the feedstock here. Perhaps the main practical thing for you is examining the changes to the julia_helpers.py in PySR: https://github.com/MilesCranmer/PySR/commit/b14e38ac6e7c719c26d1b936f2f960ab5363348f

mkitti commented 2 years ago

In 8a4dcb3 I disabled manipulation of JULIA_LOAD_PATH and JULIA_PROJECT. However, I have not updated the PySR source.

My expectation is that this run will fail because Julia will not be able to locate PyCall.jl.

ocefpaf commented 2 years ago

@ocefpaf, sorry. There is a lot going on here and a lot more to debate.

Don't be sorry. The discussion here is important! I'm sorry I don't have much to contribute to it.

MilesCranmer commented 2 years ago

@mkitti - sorry, it looks like 0.10.2 didn't actually include your change. Released 0.10.3 just now, which does.

mkitti commented 2 years ago

@mkitti - sorry, it looks like 0.10.2 didn't actually include your change. Released 0.10.3 just now, which does.

Oh, hah, I thought you were waiting for me to test it before tagging the release.

mkitti commented 2 years ago

I'm perplexed why I'm getting this error on osx only:

Traceback (most recent call last):
  File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/pysr/julia_helpers.py", line 126, in init_julia
    from julia import Main as _Main
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 672, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 632, in _load_backward_compatible
  File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 247, in load_module
    JuliaMainModule(self, fullname))
  File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 149, in __init__
    self._julia = loader.julia
  File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 239, in julia
    self.__class__.julia = julia = Julia()
  File "/Users/runner/miniforge3/conda-bld/pysr_1662508039913/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 483, in __init__
    raise UnsupportedPythonError(jlinfo)
julia.core.UnsupportedPythonError: It seems your Julia and PyJulia setup are not supported.

If line 126 throws an UnsupportedPythonError, shouldn't line 129 catch it?

https://github.com/MilesCranmer/PySR/blob/b521bff08ed6553f9464172d2687a0f42abff39c/pysr/julia_helpers.py#L125-L138

MilesCranmer commented 2 years ago

Yes, that error should be caught. Very very weird. Is there any way conda could be using a custom traceback mechanism here, that would ignore the try-except?

mkitti commented 2 years ago

I figured it out. There was an additional error. I'm thinking the original error might have been that "some_env" did not resolve to a valid path on macos.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/pysr/julia_helpers.py", line 32, in install
    Main = init_julia(julia_project)
  File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/pysr/julia_helpers.py", line 133, in init_julia
    jl = Julia(compiled_modules=False)
  File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 513, in __init__
    self._call("const PyCall = Base.require({0})".format(PYCALL_PKGID))
  File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 549, in _call
    self.check_exception(src)
  File "/Users/runner/miniforge3/conda-bld/pysr_1662506003093/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.10/site-packages/julia/core.py", line 603, in check_exception
    raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
julia.core.JuliaError: Exception 'ArgumentError' occurred while calling julia code:
const PyCall = Base.require(Base.PkgId(Base.UUID("438e738f-606a-5dbb-bf0a-cddfbfd45ab0"), "PyCall"))

MilesCranmer commented 2 years ago

Could it be two different pyjulia installations - one raises the error, the other is used to check it, hence the incompatibility with the except?

MilesCranmer commented 2 years ago

Ah, sorry, the thread didn’t update. Nice going!

mkitti commented 2 years ago

@conda-forge-admin, please rerender

github-actions[bot] commented 2 years ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/pysr-feedstock/actions/runs/3006779929.

mkitti commented 2 years ago

I'm at a stopping point on this. This package functions much as it did before this pull request. It still uses a versioned Julia environment @pysr-0.10.4 by default.

What's new

The Julia environment used can be specified by passing the julia_project argument to pysr.install and pysr.sr.init_julia. It defaults to @psr-$version.
PyCall is installed into the Julia environment specified above.
We package a Julia depot containing the needed Julia packages and their dependencies. We put it on the depot stack by modifying the JULIA_DEPOT_PATH with an activate script.
Within the depot we include a configured, resolved, and ready to use @psr-$version environment.
The user does not need to invoke pysr.install() before using the package.

Basically this conda package now installs the Julia components for you in a default Julia environment. For other Julia environments, pysr.install("/path/to/env") can still be used to install the needed Julia packages. For example, the user might want to do pysr.install(os.environ["JULIA_PROJECT"]). If one then specified the julia_project argument to PySRRegressor, then pysr can used along side other packages such as xbitinfo.

Example use

import numpy as np
import os

X = 2 * np.random.randn(100, 5)
y = 2.5382 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 0.5

from pysr import PySRRegressor

model = PySRRegressor(
    model_selection="best",  # Result is mix of simplicity+accuracy
    niterations=40,
    binary_operators=["+", "*"],
    unary_operators=[
        "cos",
        "exp",
        "sin",
        "inv(x) = 1/x",
    # ^ Custom operator (julia syntax)
    ],
    extra_sympy_mappings={"inv": lambda x: 1 / x},
    # ^ Define operator for SymPy as well
    loss="loss(x, y) = (x - y)^2",
    # ^ Custom loss function (julia syntax)
    # Uncomment the line below if installed into the default Julia environment
    # julia_project=os.environ["JULIA_PROJECT"]),
)

# Previously this would complain about the user needing to invoke pysr.install()
# Now it just works. No `pysr.install()` needed
model.fit(X, y)

No post-link script

As far as I understand, we are discouraged from doing pysr.install(os.environ["JULIA_PROJECT"]) for the user via a post-link script. Thus, the user is responsible for completing that step if they want to integrate pysr with another Python package that also uses Julia such as xbitinfo.

Potential post-link script for another pull request

Using simple bash commands in a post-link script might allow us to parse the active Project.toml and insert a few lines.

The Project.toml for @pysr-$version looks like this:

[deps]
ClusterManagers = "34f1f09b-3a8b-5176-ab39-66d58a4d544e"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
SymbolicRegression = "8254be44-1295-4e6a-a16d-46603ac705cb"

We might be able to echo the last few lines into the user's Project.toml for them via a simple bash script?

Concerns

And introducing silently inconsistent conda envs is a blocker, so we have to ensure that's not the case or at least make the solver aware of conflicts somehow.

At the end of the day the user needs to do some final integration. In particular, they need to configure a common Julia environment if they want to combine different sets of Julia packages. By doing so, they will resolve any conflicts.

This situation was not introduced by this pull request. This pull request does make it easier to resolve by providing a Julia package cache that the combined Julia environment can draw upon. The packaged depot itself does not create any conflicts with other packages.

ngam commented 2 years ago

Thanks a lot for the great work and the time explaining to all of us the details.

Sounds good to me, but I disagree on the semantics of the last point. The difference is who introduces the inconsistency: us or the user? There are many ways users can mess up their environments, but currently conda-forge doesn't intentionally ship inconsistent environments.

No matter, I think to satisfy the main concern of isuruf, we need to think about a way to establish a conflict between two julia-packages depending on two explicitly different versions of the same third julia-package. This could be done manually, e.g. run_constrained, but we should automate it if we really want this to be adopted.

ngam commented 2 years ago

For now, this conflict concern is not practical because we only have two packages around. So @isuruf is it okay to experiment with this for now and see how things go and think of a solution as we go along? How do people feel about this as a temporary measure?

MilesCranmer commented 2 years ago

I'm up for trying this out. I also don't see potential conflicts as an issue right now.

With the current approach before this PR, I don't think you could have both xbitinfo and pysr running at the same time anyways - whereas with this PR, you might be able to. At the very least, this lets us test this.

mkitti commented 2 years ago

No matter, I think to satisfy the main concern of isuruf, we need to think about a way to establish a conflict between two julia-packages depending on two explicitly different versions of the same third julia-package. This could be done manually, e.g. run_constrained, but we should automate it if we really want this to be adopted.

To be clear, there is a solution on the Julia side between BitInformation.jl and SymbolicRegression.jl. The two can exist in the same Julia environment.

(jl_DhIDzb) pkg> add BitInformation
   Resolving package versions...
    Updating `/tmp/jl_DhIDzb/Project.toml`
  [de688a37] + BitInformation v0.6.0
    Updating `/tmp/jl_DhIDzb/Manifest.toml`
...
  [34da2185] + Compat v4.2.0
...
  [ffbed154] + DocStringExtensions v0.9.1

(jl_DhIDzb) pkg> add SymbolicRegression
   Resolving package versions...
    Updating `/tmp/jl_DhIDzb/Project.toml`
  [8254be44] + SymbolicRegression v0.10.2
    Updating `/tmp/jl_DhIDzb/Manifest.toml`
...
⌅ [34da2185] ↓ Compat v4.2.0 ⇒ v3.46.0
...
⌅ [ffbed154] ↓ DocStringExtensions v0.9.1 ⇒ v0.8.6

The problem is on the Python side between pysr and xbitinfo at the moment. The solution is that pysr and xbitinfo need to use the same Julia environment in order to be compatible, but currently default to using different environments.

pysr defaults to using the Julia environment pysr-$version
xbitinfo defaults to using the Julia environment xbitinfo-$version

Solutions:

The end user configures both packages to use the same Julia environment
We, the conda-forge packagers, configure both packages to use $JULIA_PROJECT environment by default. The problem with this is that we do not know what $JULIA_PROJECT is at build time, so the user has to manually install the Julia packages into $JULIA_PROJECT.
We, the conda-forge packagers, preconfigure pysr-$version and xbitinfo-$version Julia environments to be compatible. Based on the above we just need to pin Compat.jl to v3.46.0 and DocStringExtensions.jl to v0.8.6. Since we are shipping the projects in a depot, we can do this at build time. The problem is the user could then modify the environments to be incompatible.
We, the conda-forge packagers, patch both packages to use the @conda-forge Julia environment by default. We then curate a common Julia environment for Python packages and ship that. The @conda-forge environment can exist in the environment stack below the user's environments. This may be a fully concrete environment that we pre-build, or it could be an environment that we manage for the user.

ngam commented 2 years ago

thank you for the explanation!!!

We are not trying to control the user, they're more than welcome to mess their envs.

I really like your option 3!

Except, we should patch/submit PR upstream, not just do it here. Remember, when we started this we were talking about a standard/policy for packages going forward. This is precisely the point. Now when we have newer packages in staged-recipes that are similar, we can work with maintainers on this before being added to conda-forge. We should work with staged-recipes team to make a subteam for julia-packages, and that definitely should include you.

I am fully supportive of this to merge and move on. Let's give ocefpaf and isuruf a bit more time to respond.

MilesCranmer commented 2 years ago

sounds good to me as well.

mkitti commented 2 years ago

@conda-forge-admin, please restart ci

ngam commented 2 years ago

So to be clear, the temporary plan, as I understand it, is thus:

be on the lookout for additional packages in staged-recipes, try to satisfy option 3 above
if there is no way to satisfy option 3, then set a run_constrained manually

mkitti commented 2 years ago

So to be clear, the temporary plan, as I understand it, is thus:

be on the lookout for additional packages in staged-recipes, try to satisfy option 3 above

if there is no way to satisfy option 3, then set a run_constrained manually

Yes. I admit to not fully understanding run_constrained though.

To make sure to satisfy option 3, we probably need to do part of option 4 internally. That is we need to assemble a common Julia environment.

Additionally, we should prepare the Python code to make it easy to configure the default Julia environment they use. For example, I proposed a PYSR_PROJECT environment variable that controls the default pysr project if it exists. This way, we could implement options 2 or 4 later as needed.

mkitti commented 2 years ago

I don't have any objections. The needed actions are all on the xbitinfo-feedstock side at the moment where we should pin Compat.jl and DocStringExtensions.jl. For some reason SymbolicRegressions.jl is not compatible with the latest versions of those. Perhaps we should investigate that later as well.

Basically all that we have done here is included a cache of Julia packages within the pysr conda-forge package, which happens to satisfy Julia's initial installation requirements. If someone really wanted to use the packages together, the easiest route would be to set the julia_project of PySRRegressor to @xbitinfo-{version} and install SymbolicRegressions.jl there.

ngam commented 2 years ago

This is also good for me. Let's give @isuruf a few days. Any major objections?

ngam commented 2 years ago

Incidentally, pythoncall is being added to conda-forge: https://github.com/conda-forge/staged-recipes/pull/20380 https://github.com/conda-forge/staged-recipes/pull/20379

mkitti commented 2 years ago

Incidentally, pythoncall is being added to conda-forge:

Technically PythonCall.jl is a Julia package that calls Python code. It plays a similar role as PyCall.jl.

The two recipes linked above are actually the Python components. They are used to call Julia from Python, playing a similar roll as pyjulia.

MilesCranmer commented 2 years ago

The conda version of PySR a bit out-of-date so I'll plan to merge on Monday (afternoon eastern) unless @isuruf has any objections.

Best, Miles

isuruf commented 2 years ago

Can someone summarize what this PR does?

mkitti commented 2 years ago

Can someone summarize what this PR does?

See https://github.com/conda-forge/pysr-feedstock/pull/43#issuecomment-1239243732

We install a secondary Julia depot in $CONDA_PREFIX/share/pysr/depot. This contains a package cache containing all of PySR's dependencies. It also contains a shared environment @pysr-0.10.4 that is ready to use by PySR by default without the user having to perform an explicit install step. The cache is calculated during build time allowing conda-forge the ability precisely control Julia package versions such as by pinning.

The cache can also be used by the user if they specify a julia_project environment to pysr.install(). Julia will not need to download additional packages if the versions contained in the package cache are correct. Othereise, they can be ignored. Importantly, the package cache does interfere with the user's package cache in $CONDA_PREFIX/share/julia/packages nor does it interfere with potential other depots. It is simply a cache. This is because Julia packages are stored in hashed directories depending on their version and and git commit.

If the user wants to use other Julia packages either directly or via a Python package, they should explicitly specify an environment containing those other packages.

We can only supply a resolved Project.toml and Manifest.toml. We do have a mechanism to modify the Project.toml and Manifest.toml in the current Julia environment via post-link scripts. Thr required simplicity of what we can do in post-link prevents use from invoking the Julia package manager to do this for us.

In summary, we supply a Julia package cache and environment in a depot that makes PySR ready to use.

ngam commented 2 years ago

Can someone summarize what this PR does?

In very simple terms, it installs the julia artifacts into a predetermined location inside $PREFIX to get them packaged and uploaded to anaconda.org, so that pysr works without having to (re)install the julia stuff. For now, we are going to enforce conflicts issues manually

Resolving conflicts manually: whenever a package that does this sort of thing gets added to conda-forge, it will either have to not conflict with others or a run_constrained will be placed manually --- this is okay because we are just starting and we really only have one package so far following this new strategy, and it is this package. A long-term strategy is TBD

isuruf commented 2 years ago

Why can this not be done in an activate script?

mkitti commented 2 years ago

Why can this not be done in an activate script?

You want to run the Julia package manager every time someone activates an environment?

Perhaps you mean a post-link script?

In that case, I'm taking the advice here from here: https://docs.conda.io/projects/conda-build/en/latest/resources/link-scripts.html

Post-link and pre-unlink scripts should:

Be avoided whenever possible.

Not touch anything other than the files being installed.

Not write anything to stdout or stderr, unless an error occurs.

Not depend on any installed or to-be-installed conda packages.

Depend only on simple system tools such as rm, cp, mv, and ln.

If this doesn't apply, we could certainly take that approach as we started to in #41.

isuruf commented 2 years ago

You want to run the Julia package manager every time someone activates an environment?

If the package manager has not been run yet in that environment. Post-link scripts are for only one conda package and it doesn't give an environment overview.

cjdoris commented 2 years ago

What guarantees that Pkg will use bundled versions of the packages? If one of the dependencies gets updated in the future, then Julia will end up installing it from the internet instead of from the stacked depot, right? And if you're OK with that, then what was the point of bundling the dependencies in the first place?

cjdoris commented 2 years ago

Also how do you deal with packages that may have incompatible mutual dependencies? e.g. if you release packages A and B, which depend on package C, how do you ensure that C is compatible with both versions of A and B that are installed? The logical answer seems to be that you need C to be a Conda package too. The conclusion of which is that you need to release all Julia packages you depend on as individual Conda packages, if you want to ensure compatible versions. This seems like a huge undertaking, similar in scope to conda-forge itself.

mkitti commented 2 years ago

Part of the point here is to deliver the PySR software as ready-to-use in a solution that has been fully tested by conda-forge. By invoking the Julia package manager during conda package build time we can control the versions included, pinning the dependencies as needed. The status quo that exists before this pull request is that the user has to take additional deliberate steps to make this package function. As such, Julia package management is lies completely out side of conda-forge's control. The specific configuration that the user ends up with may never has been tested together.

I concur that packaging individual Julia packages would be the ideal. To support PySR, this would require adding ~100 individual packages. That's a tremendous barrier to starting though. What I'm trying to sort out here is a way to get started while delivering a practical solution.

What guarantees that Pkg will use bundled versions of the packages? If one of the dependencies gets updated in the future, then Julia will end up installing it from the internet instead of from the stacked depot, right? And if you're OK with that, then what was the point of bundling the dependencies in the first place?

We package a complete environment with a Project.toml and Manifest.toml, so that specifies the exact package versions to use. That complete environment is the default used by PySR and it is "installed" after conda finishes. As time goes on this package will be rebuilt with new Julia dependencies. The point of bundling the dependencies in the first place is that conda and conda-forge are now managing those files. The bundled solution has been tested as part of the CI here. There is some appeal in that. In my understanding, this is what conda-forge is about and what separates from just using conda and any other channel. conda-forge aims to deliver a completely tested solution built from source.

Also how do you deal with packages that may have incompatible mutual dependencies? e.g. if you release packages A and B, which depend on package C, how do you ensure that C is compatible with both versions of A and B that are installed? The logical answer seems to be that you need C to be a Conda package too. The conclusion of which is that you need to release all Julia packages you depend on as individual Conda packages, if you want to ensure compatible versions. This seems like a huge undertaking, similar in scope to conda-forge itself.

I discussed this above. We are currently addressing a N = 2 situation. The other conda package is https://github.com/conda-forge/xbitinfo-feedstock . By bundling the packages at build time, we have a chance to reconcile the package versions. I've done the analysis in this case and we only need to pin two package versions, Compat.jl and DocStringExtensions.jl. All other packages are compatible at the latest versions.

The next targets for packaging are those at the intersection of these two packages.

Let's consider another question for a second. Why should conda-forge package pure Python or R packages? Why not leave that to pip or poetry? I think this is exactly analogous to the Julia package situation. If I can just use the Julia package manager in an activate script, why not use pip in an activate script? That might work in another conda channel, but my understanding is that is not how conda-forge works. Maybe I'm wrong.

mkitti commented 2 years ago

@ngam I bumped this to version v0.11.0 and build 1, anticipating that #47 will be merged before this one. Please update and resolve conflicts as needed.

ngam commented 2 years ago

Should we rebase and start a new PR, @isuruf and @cjdoris?

I just want to add one point to the discussion. In my mind, a super important part of conda-forge is relocatability. Let me illustrate this point. Both @MilesCranmer and I are actually affiliated with the same academic institution where our HPC compute nodes have no internet access (login nodes do). Say I want to use this great and powerful package (pysr). I get it from conda-forge on my login node and then I get an allocation to run some interesting code on the compute nodes (our HPC uses slurm). But, it will give me a silent error about not having the necessary deps, therefore wasting my time and causing confusion (now, think of an average user who thinks it will just work, not us maintainers and developers).

For me personally, this is my main motivation here. Every time conda create -n some_env some_pkg ... gets triggered, I want some_env to be able to just work without any further modification. Not only that, I also want it to work across login nodes and compute nodes. I think this is one of the greatest and best things about conda-forge. Without this PR, this package (and others like it) require the user to do an extra step. I simply don't think that extra step should exist.

If I understand correctly, the user can simply do another round of pysr.install() if they want right? Or they can just do whatever they want using Pkg as well. Here, we simply want to get the user something that just works out-of-the-box because we could and should.

Edit: this relocatability argument extends to making containers as well.

ngam commented 2 years ago

Why can this not be done in an activate script?

An activate script only moves the problem around by a tiny bit and doesn't seem to be recommended in this case; moreover, see my long-running obsession with login vs compute nodes: https://github.com/conda-forge/pysr-feedstock/pull/43#issuecomment-1244804214

ngam commented 2 years ago

Let's consider another question for a second. Why should conda-forge package pure Python or R packages? Why not leave that to pip or poetry? I think this is exactly analogous to the Julia package situation. If I can just use the Julia package manager in an activate script, why not use pip in an activate script? That might work in another conda channel, but my understanding is that is not how conda-forge works. Maybe I'm wrong.

Exactly. We have really good tools and we should use them to great effect. Otherwise, just use pip or clone the repo, etc. --- we are not just another pip because there is no need; pip is great at its thing

cjdoris commented 2 years ago

To clear up any confusion, I am already pro this proposal. Having one package manager be able to resolve both Python and Julia (and other) dependencies for us sounds great. I'm just trying to figure out how it works in practice and if it is scalable to many packages.

We package a complete environment with a Project.toml and Manifest.toml, so that specifies the exact package versions to use. That complete environment is the default used by PySR and it is "installed" after conda finishes. As time goes on this package will be rebuilt with new Julia dependencies.

Is this literally the Manifest created when you build PySR? What happens when we use Conda to install another Julia package with its own Manifest? It's not possible to instantiate two Manifests, do you merge them somehow?

A few smaller questions:

As discussed, all Julia dependencies will ultimately need to be added, namely 100s or 1000s of packages. Are there plans/ambitions to automate this, otherwise it is a huge undertaking.
Currently you package all packages/artifacts in the depot. But if (in the future) all Julia dependencies are their own Conda packages, you only actually need whichever packages/artifacts are new in the depot.
Is there a reason you don't install the packages/artifacts into the existing depot? The Julia distro in conda-forge already creates its own depot in the Conda environment, and these packages/artifacts will have distinct paths, so this should work fine I believe. In a world where every Julia package is a Conda package, we probably don't want 100s of depots stacked up. Thinking about it, separate depots is probably necessary in the current setup just because there are a few common dependencies, so there would indeed be a few file clashes, but this problem would go away in a world where every Julia package is a Conda package.

cjdoris commented 2 years ago

Thinking out loud, just as a point of comparison, there is also my JuliaPkg Python package for managing Julia dependencies. This is the default way that JuliaCall manages its dependencies. It basically provides a way for a Python package to declare any Julia dependencies it needs, and then JuliaPkg will ensure those dependencies are met.

Some pros of JuliaPkg compared to the conda-forge proposal:

It's very simple, just a pure-python package, which doesn't require Conda.
It is based on Pkg, Julia's package manager, so should be very reliable and non-surprising.
There is no separate registry of packages required (whereas the conda-forge way requires registering all Julia packages again).

Some cons:

It IS possible to get into an inconsistent state using JuliaPkg, because two Python packages may declare incompatible Julia dependencies. But this will raise an error when dependencies are resolved, leaving it to the user to fix things (unlike say Pip which can just install an inconsistent set of packages) so at least it is not a silent error.
- However I don't anticipate this will be a problem very often in practice because JuliaPkg/JuliaCall is mostly intended for writing wrapper packages, which will only normally have one dependency (the Julia package being wrapped).
- It can also be solved by creating Python packages such as juliapkg-SymbolicRegression which just declare a JuliaPkg dependency on SymbolicRegression at a particular version. Then the Python package manager can ensure all the Julia packages are compatible too. This is a similar complexity of effort to creating a Conda package for every Julia package.
There is a post-install step (juliapkg.resolve()) to resolve/install Julia dependencies, whereas the conda-forge way has everything already installed.
If a Python package using JuliaPkg depends on some Julia package which itself depends on some Python package, then the Python->Python dependency is not visible to the Python package manager, so will be missed, which will cause some runtime error. Similarly for Julia->Python->Julia dependency chains. The Conda way does not have this issue.
- Again, it can be solved by having Python packages for each Julia package, because then these can also declare their own Python dependencies.

So if we instead just used JuliaPkg everywhere, with the addition of juliapkg-X Python packages for every Julia package X to declare a dependency on X at a specific version, then we get something very similar to the current proposal. This has one key down-side, namely there is a post-install step (juliapkg.resolve()) required before severing the internet connection. But a large up-side is that this works in any Python package manager (Pip/Poetry/Python/etc).

mkitti commented 2 years ago

@conda-forge-admin, please rerender

mkitti commented 2 years ago

Is this literally the Manifest created when you build PySR? What happens when we use Conda to install another Julia package with its own Manifest? It's not possible to instantiate two Manifests, do you merge them somehow?

Yes, this is the manifest we used to build. We put it in a dedicated environment called @pysr-{version} which is the default environment that PySR will try to use. The other Python package also creates it's own environment. Thus we don't actually merge them on the user's machine.

The current plan is to reconcile the two environments by pinning packages while building the conda-forge packages such that the Julia packages are at the same version. At the moment, this is only requires pinning two packages.

As discussed, all Julia dependencies will ultimately need to be added, namely 100s or 1000s of packages. Are there plans/ambitions to automate this, otherwise it is a huge undertaking.

Ultimately, I think automation is the key. conda-forge seems to have a tradition of automating as much as possible with Github bots, so there is precedence. I'm just not sure if conda-forge is ready for me to stage 100 packages at once. That's why I'm taking a more evolutionary approach where we start with a small number of packages with larger bundles and then focus on the intersection of the current packages. PyCall.jl and PythonCall.jl would definitely be one of the first Julia packages I think.

Is there a reason you don't install the packages/artifacts into the existing depot? The Julia distro in conda-forge already creates its own depot in the Conda environment, and these packages/artifacts will have distinct paths, so this should work fine I believe. In a world where every Julia package is a Conda package, we probably don't want 100s of depots stacked up. Thinking about it, separate depots is probably necessary in the current setup just because there are a few common dependencies, so there would indeed be a few file clashes, but this problem would go away in a world where every Julia package is a Conda package.

The existing depot, which we moved from ~/.julia to $CONDA_PREFIX/share/julia is still under the user's control. Given the lack of Julia packages in conda-forge, I would expect the user to have installed some Julia packages directly rather than through conda. Rather than deal with a potential collision of files, Julia's depot stacking allows us to bypass this issue entirely. Separation of "user" and "system" installed packages is the intended use case of depot stacking. In this case, conda-forge is playing the role of the "system".

Earlier I had proposed moving towards a common conda-forge managed depot, but this requires us to eliminate the overlap in packages. In principle this is possible, but it will require additional conda packages to implement.

Thinking out loud, just as a point of comparison, there is also my JuliaPkg Python package for managing Julia dependencies. This is the default way that JuliaCall manages its dependencies. It basically provides a way for a Python package to declare any Julia dependencies it needs, and then JuliaPkg will ensure those dependencies are met.

This is a nice approach. Conda-forge could use JuliaPkg at build time to deliver the initial package state. Thus, JuliaPkg is complementary to the approach taken here.

cjdoris commented 2 years ago

Thanks this is all really helpful.

The current plan is to reconcile the two environments by pinning packages while building the conda-forge packages such that the Julia packages are at the same version. At the moment, this is only requires pinning two packages.

OK so you install one environment per Conda package, giving you a consistent set of Julia packages. Is there a mechanism to merge these into a single environment, so that it's possible to access a bunch of unrelated Julia packages in a single session? I think maybe that's what you meant in the quote above but I'm not sure. How does it work?

conda-forge / pysr-feedstock

v0.10.4 with new build strategy #43

What's new

Example use

No post-link script

Potential post-link script for another pull request

Concerns