Use julia_project to manage Julia dependency

SciML / diffeqpy

Solving differential equations in Python using DifferentialEquations.jl and the SciML Scientific Machine Learning organization

MIT License

508 stars 39 forks source link

Use julia_project to manage Julia dependency #100

Closed jlapeyre closed 9 months ago

jlapeyre commented 2 years ago

This PR uses julia_project and find_julia to handle installing and managing Julia and Julia packages.

ChrisRackauckas commented 2 years ago

Hmm, this project still is setup for Travis. @tfk do you have a suggest CI setup to test this?

jlapeyre commented 2 years ago

You could use julia_project for other projects such as https://github.com/JuliaPOMDP/quickpomdps/issues/7 referenced above. But, you could not use both diffeqpy and quickpomdps together in one python runtime. julia_project should be modified to accomodate this. I'm not sure how to do it.

I think you can use these two projects together as they are, without julia_project. But, any messiness, conflicting libpython (which will happen often under windows IIUC) , creating a new Julia project, etc. would have to be handled manually. One goal of julia_project is to insulate the Python user from Julia (at least they have the choice to ignore Julia). Python modules written in Rust don't require the Python user to touch rust in any way. I see this as a way to drive Julia adoption.

I should probably fork and modify quickpomdps just so I can experiment with ways to get the two projects to work together. One clue might be in something David Anthoff wrote: The Project.toml (and Manifest.toml) serves two purposes, to define packages and to define environments. The uses are separate and the file Project.toml could (should) have been given two different names. I think in julia_project it's not clear which role it is playing. We might want to manage two Project.tomls. One for each Python package, eg diffeqpy. This is like a package Project.toml. It can include a compat section. But julia_project would also manage a Python module-level Project.toml that represents the environment. pyjulia already stores things at the module level, i.e. managing Julia is not all encapsulated in a class. This is more complicated than I like, but we may be forced into something like this.

EDIT: PythonCall.jl manages Julia dependencies from from python by using Pkg at a lower level https://github.com/cjdoris/PythonCall.jl/blob/main/juliacall/deps.py

But, PythonCall.jl is not flexible enough. In pyjulia you have entry points in the initialization process to manage your own system image, etc. But, PythonCall.jl is hermetically sealed. The author is interested in splitting some stuff out.

jlapeyre commented 2 years ago

I think there should be a language-agnostic way to handle julia installations with a single transparent user interface. It'd be bad if each language/framework handles Julia installations in its own way.

Here are several thoughts on choosing an installer. The bottom line is that using juliaup would be much more difficult for my purposes, which is to make installing a python module that depends on Julia as easy as installing a python module that depends on a rust or c++ library.

I agree it would be best to have a single interface. But, I agree with the comments in JuliaPy/pyjulia/#473 in that it is premature to pick a winner.
jill.py is a library, but also a cross-platform command line application. It's not Python centric. It's an application written in Python.
Python is a great language for writing an installer. juliaup is written in Rust, a high-performance systems language. It is much harder to develop, and has a much smaller base of possible contributors and eyeballs. This is worth it if you need high performance. But, that's not the case here.
When writing juliaup David Anthoff went C++ -> Julia -> Rust. I'm not sure why, except that clearly the last choice is the best of the three. It could well be that it is easier to deploy a self-contained rust program than a python program. I don't really know. But that would be a great argument for using rust over python. There are probably other good reasons to choose rust that I don't know about. But, the advantages of jill.py are transparent to me.
The location of Julia installations, and links to them is clearly documented for all platforms in jill.py. It is not documented at all for juliaup.
juliaup is more complicated than jill.py and the organization of the installed files and auxiliary binaries is a bit more complicated. The whole thing includes shell scripts, windows cmd scripts, xml files, rust source and compiled binary. There may be some benefit to the complexity.
juliaup for osx and linux is labeled experimental with known bugs. (In practice, it worked for the basic functionality on my system)
I am looking for something that allows a Python user to do pip install thepackage, and then import thepackage. So, I would have to detect which platform is being used and then download the appropriate juliaup installer (there are seven of them) (unless it's already installed) and then run the installer.
David Anthoff currently wants to hide the installation location of julia as an implementation detail and automatically add it to the user's path. This is a reasonable choice. But, it conflicts with other choices. For example, I like to make julia a script that runs julia with a particular system image. Johnny Chen already agreed to make returning a list of installed versions and their paths part of the jill.py library API (I did not finish the PR yet, but I copied the code to find_julia). Looking at my juliaup installation, I see that getting this information would be more complicated.
jill.py is both a library and an application. juliaup is only an application. So, find_julia and julia_project can be optionally controlled by environment variables, rather than interactive questions. This makes using containers, and test frameworks easier. As far as I can tell, this is not possible with juliaup.

jlapeyre commented 2 years ago

One problem is that tox currently takes 15 minutes to run locally. It installs packages and builds a system image in both the source dir and the environment created for tox. I'm not sure, but I suspect this may be because of what @tkf mentioned. You should be able to import the module without doing any work that has side effects.

tkf commented 2 years ago

The bottom line is that using juliaup would be much more difficult for my purposes, which is to make installing a python module that depends on Julia as easy as installing a python module that depends on a rust or c++ library.

Why not install juliaup on the fly and then use it to install Julia? The point is that the user has access to the application storage from a CLI and it's shared across all languages and frameworks.

So, I would have to detect which platform is being used and then download the appropriate juliaup installer (there are seven of them) (unless it's already installed) and then run the installer.

I'm not sure if that's the downside. You only have to validate at least one binary for each platform. You can then use the cryptographic verification for all possible Julia binaries (and possibly new juliaup binary with self-update) implemented in juliaup. It seems like a big upside, given that other user-facing installation management interfaces come for free as well.

Python is a great language for writing an installer. juliaup is written in Rust, a high-performance systems language.

While I love Python as an excellent language for writing scripts easily, I disagree that it's a good language for creating simple-to-distribute self-contained CLI. Of course, there are various ways to create a self-contained Python application but it's not as straightforward as using a language with an AOT compiler. While I respect the effort and passion that went into jill.py and your julia_project, I don't think the argument "jill.py is a language-agnostic application" works for non-Python users if it requires them to understand how to use pip or Python to be already installed.

All that said, let me note again that I'm not working on this and I have no intention to be a blocker. Chris seems to like how things are handled in R which is similar to what is suggested in this PR, IIUC. So, I think there's a good chance this gets in.

jlapeyre commented 2 years ago

Why not install juliaup on the fly and then use it to install Julia? The point is that the user has access to the application storage from a CLI and it's shared across all languages and frameworks.

As I said above, jill.py is a cross-platform application. Its a CLI application

shell> jill --help | cat
INFO: Showing help with the command 'jill -- --help'.

NAME
    jill

SYNOPSIS
    jill COMMAND

COMMANDS
    COMMAND is one of the following:

     download
       download julia release from nearest servers

     install
       Install the Julia programming language for your current system

     upstream
       print all registered upstream servers

     mirror
       Download/sync all Julia releases

     list
       List all Julia executable versions in symlink dir

     switch
       Switch the julia target version or path.

If someone already has python installed or is willing to install it, then jill.py is clearly a far easier solution than juliaup. @sibyjackgrove tested julia_project on windows even though I never tested it on windows, and it works. I could not do that with juliaup. (There were some install problems with julia_project in that case, but not due to windows, but rather cross-platform install issues that I corrected.) For julia_project the user always has Python installed already, so jill.py is a clear winner. I wanted to have find_julia look for julia where it is installed by juliaup. But, unlike jill.py that's not documented. I could probably even use jill.py to install to the juliaup locations if I knew where they are. But, that would require me to test on several platforms or try to find someone to do it for me. Some or all of the installation path is meant to be hidden from the API. On linux, I can install several versions to find how the links in ~/.juliaup/bin are done, and hope that that part is stable. (the links point to a tree inside ~/.julia. If I knew what I could count on for all platforms, I would strongly consider being at least compatible with juliaup.

disagree that it's a good language for creating simple-to-distribute self-contained CLI. Of course, there are various ways to create a self-contained Python application but it's not as straightforward as using a language with an AOT compiler.

I strongly suspected this from the beginning. Why else would someone use an AOT compiler for this? I looked briefly yesterday for how to package a simple-to-distribute self-contained python application. I think I saw dead projects, old ill-maintained projects. It did not look encouraging at all. So yeah, using jill.py for people who don't want, or can't, install python would be tough.

Python rules the world. In the spaces I am targeting, nothing else matters. I have to be practical given my environment and very scare resources (mainly time). They have no incentive to accommodate us. I have to accommodate them. I want to maximize the probability that the Python world accepts things like this. The more Python they see, the happier they are. (Of course, there is a small minority that has a broader view). I also have to do all of this myself, including the project that I originally wanted to do. If I get time in the future to try to support juliaup, I think it would be a good idea. By the way find_julia does the searching and downloading, and julia_project depends only on the find_julia API. So, juliaup could be used in find_julia in the future. I have every incentive to try to support juliaup (I could also support the shell jill). If I can avoid downloading Julia, so much the better.

All that said, let me note again that I'm not working on this and I have no intention to be a blocker.

Well you have by far the most experience in designing things to call Julia from Python. So, it is very useful to hear your opinions. For instance, not doing work with side-effects when importing. So thanks for taking the time to weigh in! (By the way, can you explain a bit more the situations in which side-effects on import are a problem?)

Chris seems to like how things are handled in R which is similar to what is suggested in this PR, IIUC.

Oh, I need to check that out.

ChrisRackauckas commented 2 years ago

Rebase onto master for CI?

tkf commented 2 years ago

Python rules the world. In the spaces I am targeting, nothing else matters.

Yeah, I support the idea even though the implementation is not of my taste. It'd be great to see more Julia-based packages in PyPI. Anyway, now that we have POC jill integration merged #86, I'll stop complaining about this :slightly_smiling_face:

BTW, consider #86 as a sketch of an implementation and feel free to tweak the CI setup if you have something else based on julia_project

(By the way, can you explain a bit more the situations in which side-effects on import are a problem?)

I'll comment on https://github.com/SciML/diffeqpy/pull/100#discussion_r782511903 to keep the conversation linear

jlapeyre commented 2 years ago

I'll stop complaining about this

It's important to think about the options and defend your choice. I plan to ask the juliaup people some questions on paths and so forth to see whats possible.

I'm fairly sure the combination of the CI in #86 and tox.ini will not work with julia_project without tweaking.

jlapeyre commented 2 years ago

Rebase onto master for CI?

I tried to do that. It was a bit of a mysterious process. I think what I pushed now is correct.

jlapeyre commented 2 years ago

Hm CI found a code path with a bug. But it shouldn't have taken that path anyway. Um, I'll fix the bug first. EDIT: So the bug is fixed. But it occurred in this path: The consumer (diffeqpy) gave find_julia a list of preferred versions, including 1.7, which jill.py installs by default. Then a jill.py-installed was found, but it's not 1.7. I've never seen this in any tests.
tox on my local machine seems pretty slow even accounting for installing and building twice.
Maybe for another PR: There are some scripts that exercise DifferentialEquations during the building of the system image. But, they don't actually seem to speed up examples. I'm not sure if the examples run time to compilation time is really high, or if there is some subtlety that I'm missing

jlapeyre commented 2 years ago

tox succeeds locally, even on a machine with no Julia executable or packages installed.

EDIT: no wait, ignore below.

~The path that is failing seems to imply a dict that evaluates to logical True in a conditional, yet iterating over values iterates zero times. That is the dict is apparently empty, but is True. Makes no sense. Also the dict being empty is correct, if there is no Julia installed which is the case.~

tkf commented 2 years ago

https://github.com/SciML/diffeqpy/runs/4811765599?check_suite_focus=true#step:5:29

return next(iter(self.results.jill_julia_bin_paths.values())) # Take the first one

I'd write something like

for x in self.results.jill_julia_bin_paths.values():
    return x
return ??default???

jlapeyre commented 2 years ago

That would probably be more clear, but the logic would be slightly different. In any case, the bug is because this is macos. The directory that jill installs to is always present, but has no julia installations. I did not exercise this path till now. EDIT: it now reads as follows with the first line catching both None for non-existing directory and an empty dict for a directory with no julias in it. Might not be a bad idea to change the last line anyway.

        if not self.results.jill_julia_bin_paths:
            return None
        for pref in self.preferred_julia_versions:
            bin_path = self.results.jill_julia_bin_paths.get(pref)
            if bin_path:
                return bin_path
        if self._strict_preferred_julia_versions:
            return None
        return next(iter(self.results.jill_julia_bin_paths.values())) # Take the first one

tkf commented 2 years ago

So, I think there are still too much book keeping logic inside of diffeqpy. I think most of the stuff should go into julia_project (mainly so that you can improve things without bothering Chris or me). I suggest the following design.

(1) We have diffeqpy/_julia_project.py that "declares" but not execute julia_project.JuliaProject:

from from julia_project import JuliaProject

project = JuliaProject(
    name="diffeqpy",
    package_path=__file__,
    ... other things ...
)
# end of file

(2) diffeqpy/__init__.py directly exports julia_project.JuliaProject API:

from ._julia_project import project

This way, a user can run diffeqpy.project.update() etc. to manage the Julia project from Python. Crucially, each Python package does not need to define its own bookkeeping logic. That is to say, we have $PYTHON_PACKAGE.project.$MANAGING_COMMAND() as a consistent UI/API across all Julia-Python bridging packages.

(3) Invoke the magic command in diffeqpy/de.py:

from . import project
project.ensure_init()

Ideally, julia_project can provide an API like project.disable() or even julia_project.disable_all() so that project.ensure_init() is a no-op. This is useful for users who know and wants to control exact version of Julia packages.

Looking at julia_project README, it sounds like project.run() activates diffeqpy/Project.toml. This would be problematic when there are multiple Julia-based Python packages. Instead, I suggest the following:

Copy diffeqpy/Project.toml to ~/.julia/environments/__python_julia_project_$VERSION_$SLUG/Project.toml where $VERSION is a Julia version and $SLUG is the hash of the realpath of sys.executable if it does not exist. Let us call ~/.julia/environments/__python_julia_project_$VERSION_$SLUG a $LOCAL_ENV.
Instantiate $LOCAL_ENV/Manifest.toml if it does not exist.
Push $LOCAL_ENV to the end of Base.LOAD_PATH (if it does not exist). Or to the beginning, if you want to make it more magical (i.e., ignore some stale package that exist in user's default environment).

This way, julia_project should be usable from multiple Python projects. Of course, this is still rather wacky since Pkg cannot ensure all the packages in the "stacked environment" Base.LOAD_PATH are of consistent versions. But that's an inherent problem for using an automagic approach like julia_project. For a sane behavior, users need to use the Pkg API as in julia.Pkg.activate(PATH). (Of course, julia_project can do more magics like keeping entire the stacked environment consistent.)

jlapeyre commented 2 years ago

This would be problematic when there are multiple Julia-based Python packages.

I anticipated this in a comment above. I had planned to tackle this later because I did not have a clear idea of what to do. As a first step, I planned to provide a way to avoid activating the Project.toml so that the user would have a chance to manage the packages manually; something like disabling ensure_init.

$PYTHON_PACKAGE.project.$MANAGING_COMMAND()

In the end, I think this is better. I did it the other way because I wanted to hide more of the JuliaProject stuff. But, I agree the advantage of having a uniform UI is more important.

I thought of using "stacked environments", but I never managed to make that work for myself, so I shied away. I imagined I might have to do something more low-level, like parsing Project.toml and calling lower-level Pkg functions. But, maybe a stacked environment is fine.

Pkg cannot ensure all the packages in the "stacked environment" Base.LOAD_PATH are of consistent versions.

Isn't this problem inherent to the using stacked environments ? I mean, is this peculiar to the "automagic" approach?

For a sane behavior, users need to use the Pkg API as in julia.Pkg.activate(PATH)

You mean, if the user wants to use two python packages that depend on Julia, then activate a Julia project and add then necessary Julia packages for each? We could make something like this possible, but I would not want to require it.

Then there is the question of building system images. This is important because I want to reduce latency. If you use only a single Python package that uses julia_project, then this is not difficult. So I want to preserve this option. For two or more Python packages, I suppose you would load a system image (or not) for the first package. The remaining packages will have to be loaded and compiled.

tkf commented 2 years ago

Isn't this problem inherent to the using stacked environments ? I mean, is this peculiar to the "automagic" approach?

Yes, you are right. I was sloppy. The problem is inherent to how Julia itself handles LOAD_PATH. I wanted to emphasize that the approach I was proposing was wacky since bad thing can happen behind user's back.

You mean, if the user wants to use two python packages that depend on Julia, then activate a Julia project and add then necessary Julia packages for each? We could make something like this possible, but I would not want to require it.

Yeah, I get that this PR is about automation. I just wanted to point out something like julia_project.disable_all() provides a solution for users who want strong reproducibility. For example, you can check in Project.toml and Manifest.toml for Julia projects and something similar, say, pyproject.toml and poetry.lock for Python. You can then write a small activation script to set JULIA_PROJECT environment variable and start a program via poetry.

Then there is the question of building system images.

This is where "no magic init" principle is useful. If all Julia-based Python packages follow this principle and then don't initialize PyJulia on import, you can create a sysimage for each combination (in principle):

import diffeqpy
import makie  # hypothetical

import julia_project
julia_project.compileall()  # also initialize PyJulia (maybe not a good name)

from diffeqpy import de  # loaded from sysimage

where julia_project.compileall() combines and compiles all projects into a sysimage and then initialize PyJulia. But it's a rather challenging and I can see that sysimage-per-project covers a lot of use cases.

jlapeyre commented 2 years ago

I can't afford to make something really robust at once. If I can get something that works well enough, my company (or others) might be more interested in allocating resources. But, it's probably a good idea to try to anticipate so that the interface doesn't change too quickly. No auto-init is one item to start with. I can spend some time redesigning; but I have less time for this in the near future, I did a lot of it over holidays.

Your system image idea is nice. What I have currently is simple, it just uses the API that PackageCompiler offers, and it is on the packager (me, or you or Chris) to include compile_julia_project.jl etc. Very easy. But for two projects, I will instead have to invent a system to record the packages and code that is passed via keyword argument compile_execution_file in say a toml file. Then read this from each project and combine it. And a system for storing the images. It has to be something that is somehow cached or not retriggered. I can't have it happen every time a Python user starts a new Jupyter notebook. Maybe have a system image for each combination of Julia-based Python packages.

Currently the density of Julia-based packages in use is very low, so package-package interactions are negligible. It would be great to be in a situation where were forced to deal with interactions.

jlapeyre commented 2 years ago

still too much book keeping logic inside of diffeqpy.

I don't understand what you are referring to here. I don't see any book keeping. All I see that could be removed is

def compile_diffeqpy():
    """
    Compile a system image for `diffeqpy` in the subdirectory `./sys_image/`. This
    system image will be loaded the next time you import `diffeqpy`.
    """
    julia_project.compile_julia_project()

which I made as as an obvious convenience. I did it this way so the user does not have to know that julia_project exists (except for some intrusions during installation). Of course, there is a good argument for removing it and exporting project so that the user must do diffeqpy.project.compile_julia_project() (maybe rename these). It's slightly more robust. But, you do lose something in that the docstring explaining what it does now has to be put somewhere discoverable. Of course we should remove the string "julia" everywhere.

tkf commented 2 years ago

I can't afford to make something really robust at once.

Of course, it's not like everything has to be implemented in one go. But I thought the basic design (1)--(3) I commented https://github.com/SciML/diffeqpy/pull/100#issuecomment-1012732841 (without the future/ideal improvements I discussed) can be done with a very small effort. Essentially everything is in this PR. So, isn't it "just" removing compile_diffeqpy and update_diffeqpy and adding something like JuliaProject.ensure_init method like this?

class JuliaProject:
    ...

    initialized = False

    def ensure_init():
        if not self.initialized:
            self.initialized = True
            self.run()

I did it this way so the user does not have to know that julia_project exists

I think we just have to document that you can call diffeqpy.project.compile_julia_project etc. For example, we can put something like the following in the docstring of diffeqpy/__init__.py

Project management
------------------

You can call methods of ``diffeqpy.project`` to manage underlying Julia projects.
Notable methods are:

``diffeqpy.project.project.compile_julia_project()``
  Compile a system image for `diffeqpy` in ...

``diffeqpy.project.project.update()``
  Remove possible stale Manifest.toml files and compiled system image.
  ...

For more details, see: https://github.com/jlapeyre/julia_project

This way, users can get some overview by typing diffeqpy? in the REPL. Furthermore, we don't need to update diffeqpy every time julia_project adds new feature (e.g., new methods, new optional arguments).

you do lose something in that the docstring explaining what it does

Per-instance docstring is tricky but I think there are various ways to do it. Maybe you can create a subclass in __new__ (untested):

class JuliaProject:
    def __new__(cls, *, name, **_kwargs):
        class NewJuliaProject(cls):
            __doc__ = f"docstring for {name}"
        return object.__new__(cls)

or a simpler solution is to provide a factory function julia_project.new_project and then do something like

def new_project(*, name, **kwargs):
    class NewJuliaProject(cls):
        __doc__ = f"docstring for {name}"
    return NewJuliaProject(name=name, **kwargs)

Maybe the __new__-based solution is OK, but, for maximal flexibility on your side, using a factory function is better.

ChrisRackauckas commented 2 years ago

So do I merge this?

jlapeyre commented 2 years ago

I'd like to make some or most of the changes that @tkf asked for first. I was doing other things, but I am now finishing organizing/opensourcing the other application of julia_project, part of which will be making these tweaks to julia_project. Then I can update this PR to match the tweaks.

ChrisRackauckas commented 9 months ago

Now managed by JuliaCall.