JuliaLang / Pkg.jl

Pkg - Package manager for the Julia programming language
https://pkgdocs.julialang.org
Other
621 stars 261 forks source link

WIP: App support in Pkg #3772

Open KristofferC opened 8 months ago

KristofferC commented 8 months ago

This is quite heavily WIP towards having "app" support in Pkg. An app is a program that you just write its name in the terminal and it starts up, without explicitly having to invoke Julia, load the package, and call a function. Every app has an isolated environment.

More details of the design can be found in this hackmd: https://hackmd.io/r0sgJar5SpGNomVB8wRP_Q

This PR requires https://github.com/JuliaLang/julia/pull/52103

Here is some example usage:

(Pkg) pkg> app st

shell> rot13
zsh:1: command not found: rot13

(Pkg) pkg> app add https://github.com/KristofferC/Rot13.jl
    Updating git-repo `https://github.com/KristofferC/Rot13.jl`
  Activating project at `~/.julia/environments/apps/Rot13`
  No Changes to `~/.julia/environments/apps/Rot13/Project.toml`
  No Changes to `~/.julia/environments/apps/Rot13/Manifest.toml`

(@Rot13) pkg> app st
[43ef800a] Rot13 v0.1.0 `https://github.com/KristofferC/Rot13.jl#master`:  rot13 /home/kc/julia/usr/bin/julia 

shell> rot13 Rotate this please
Ebgngr
guvf
cyrnfr

(@Rot13) pkg> app rm rot13
[ Info: Deleted app rot13

(@Rot13) pkg> app st

cc @MasonProtter, @roger-luo

tecosaur commented 8 months ago

Oh this looks very cool! Thanks for all the time/effort that's gone into this :star_struck:


One thing I'm slightly concerned about here is the approach taken to making sure that the executables are on the users's PATH on Linux/BSD systems.

I see in the design document there is some mention of putting such files in a more standard location already on the path such as ~/.local/bin, and that Cargo is mentioned in response to this. I think it's worth noting that there is a well-documented series of efforts (like this issue) to make Cargo more XDG-compliant (https://poignardazur.github.io/2023/05/23/platform-compliance-in-cargo/ does a good job outlining this, and describing a path forwards for Cargo). The Cargo discussion can essentially be summed up as "would have been good, but a bit late now".

Other lang's package managers already install things in the XDG-appropriate locations, such as Python with pip install --user (new/alternative Python package managers like poetry copy this behaviour).

I'd advocate for a ~/.local/bin approach on Linux/BSD for these reasons. To programmatically determine which executables in ~/.local/bin are managed by Julia, the executable files could be put inside a Julia-managed directory, and then symlinked to ~/.local/bin. I think this approach keeps much of the benefits of the custom-bindir added to PATH approach while avoiding the major pitfalls.

(NB: when I say ~/.local/bin I really mean ${JULIA_BIN_DIR:-${XDG_BIN_DIR:-$HOME/.local/bin}}, but that's a bit of a mouthful)

Roger-luo commented 8 months ago

nice work! I'm wondering how apps are shared across Julia versions? e.g. are they isolated by Julia versions like how the global environment are setup?

PallHaraldsson commented 8 months ago

intended to be run by the user as appname args to app. [..] It’s assumed that Julia is installed and serves as the “driver” to start up the app.

This seems useful, maybe already dispute that limitation. Could it be lifted by autoinstalling Julia (runtime, right version) for you if not available? Needs not be in first version.

This is in some ways similar to Python's zipapps (which I believe is not too popular, because runtime can't be assumed, even for Linux where it's most often preinstalled), that needs separate .pyz[w] file ending, and Python installed (and are in one archive file, optionally compressed):

https://docs.python.org/3/library/zipapp.html

There is no way to say “python X.Y or later”, so be careful of using an exact version like “/usr/bin/env python3.4” as you will need to change your shebang line for users of Python 3.5, for example.

[We already have AppBundler.jl if you want to bundle the runtime, it's best if you can have one way to make an app and it can be compiled with PackageCompiler, or use AppBundler, or a combining those..., or this system. ]

KristofferC commented 8 months ago

One thing I'm slightly concerned about here is the approach taken to making sure that the executables are on the users's PATH on Linux/BSD systems.

With regards to XDG there is an argument that Pkg should follow what Julia itself does. (As you are aware) there is https://github.com/JuliaLang/julia/issues/4630. juliaup also uses this method of installing Julia and since juliaup is more or less the official way to install Julia it feels like if you have managed to install Julia itself, this should be fine. So there is a tension here between doing XDG (which some people would argue is the correct way) and to fit in how things are done everywhere else in Julia and its ecosystem.

A related question, according to XDG where should the .julia/environments/apps/Package folder go?

For Windows the Cargo issue comment says:

For Windows, everything should go in ~/appdata/locallow or ~/appdata/local,since ~/.cargo is just a cache, AFAICT. This is FOLDERID_LocalAppData for SHGetKnownFolderPath, CSIDL_LOCAL_APPDATA for SHGetFolderPath, and %LOCALAPPDATA% in the environment.

How is that translated to all the files used here (shims, AppManifest.toml, app environments)?


Other lang's package managers already install things in the XDG-appropriate locations, such as Python with pip install --user

I get

❯ pip install --user httpie                       
Requirement already satisfied: httpie in /Users/kristoffercarlsson/Library/Python/3.9/lib/python/site-packages (3.2.2)

~/Library/Python/3.9/bin
❯ ls
git-filter-repo  http  httpie  https  markdown-it  pygmentize
KristofferC commented 8 months ago

nice work! I'm wondering how apps are shared across Julia versions? e.g. are they isolated by Julia versions like how the global environment are setup?

As it is right now each app entry in AppManifest.toml has an absolute path to a Julia installation. If you want to update that Julia version you would also resolve the environment. This ties into this later comment:

Could it be lifted by autoinstalling Julia (runtime, right version) for you if not available? Needs not be in first version.

one plan forward is to use Juliaup to install the Julia installation that the app is currently configured for if it does not exist. That way you would not store the absolute path to the julia installation like that.

tecosaur commented 8 months ago

With regards to XDG there is an argument that Pkg should follow what Julia itself does.

Right. I basically see Julia as currently being in a similar situation to Cargo — in that by the end of https://github.com/JuliaLang/julia/issues/4630 I think I can fairly summarise the consensus as "yes this would be nice to have, but it's going to be a hassle to start using it".

Much of the value of the XDG Desktop spec comes via a network effect. Thus when the Desktop spec was new and that issue was created in 2013, the benefit was somewhat speculative. Now though, as more tools use and assume XDG compliance, it creates a growing tension between the "Julia way" and the XDG way.

In this sort of light, I see decisions like this as opportunities to choose between digging down and digging out :stuck_out_tongue: somewhat. I still have loose plans to go back to https://github.com/JuliaLang/julia/issues/4630 to see if I can help move the state of affairs closer to XDG compliance (Stefan asked me if I'd be interested in putting a PR together a few months ago, and I am once I have fewer PRs currently open).

Considering the current "Julia way" and the XDG spec, would it not be possible to put things in ~/.julia/bin as the "Julia-managed directory" that executables are written into, and make symlinks into ~/.local/bin? I might well be missing something, but it seems to me that this way the current assumptions around ~/.julia/bin hold but we also get the benefits of using the XDG-appropriate dir as outlined in my first comment.

A related question, according to XDG where should the .julia/environments/apps/Package folder go?

I made a flowchart for answering this sort of question in the BaseDirs.jl docs which might be helpful (it's not 100% accurate, but I didn't want to make it more complicated, and I think it gets 98% of the way).

If we classify .julia/environments/apps/Package as:

then Data Home would be the relevant XDG Desktop component (let me know if any of those assumptions don't hold).

More generally, I find .julia/environments/ a bit interesting in that it's a mix of automatically-changed and user-modified environments. The v1.x environments are changed when the user explicitly asks for a package to be installed/removed, and so line up best as "user configuration". However, you also have environments like __pluto_boot_v2_1.8.5 which are very much not, and probably best classed as user data.

For Windows the Cargo issue comment says:

For Windows, everything should go in ~/appdata/locallow or ~/appdata/local, since ~/.cargo is just a cache, AFAICT. This is FOLDERID_LocalAppData for SHGetKnownFolderPath, CSIDL_LOCAL_APPDATA for SHGetFolderPath, and %LOCALAPPDATA% in the environment.

How is that translated to all the files used here (shims, AppManifest.toml, app environments)?

A while ago I spent an inordinate amount of time looking at the relevant behaviour/specs/comments around directories on Windows/Mac. I think I'd probably be best off pointing you to the comparison table on https://tecosaur.github.io/BaseDirs.jl/stable/defaults/ (and if you want the reasoning/links to some of the most relevant resources: https://tecosaur.github.io/BaseDirs.jl/stable/others/).

Regarding just this part of the comment:

This is FOLDERID_LocalAppData for SHGetKnownFolderPath, CSIDL_LOCAL_APPDATA for SHGetFolderPath, and %LOCALAPPDATA% in the environment.

Yea, getting the right system dirs on windows is actually a bit of a pain. See https://github.com/tecosaur/BaseDirs.jl/blob/main/src/nt.jl for a glimpse of me not having a fun time.

davidanthoff commented 7 months ago

one plan forward is to use Juliaup to install the Julia installation that the app is currently configured for if it does not exist. That way you would not store the absolute path to the julia installation like that.

My plan generally is that the Julia version in a manifest becomes the version selector for Juliaup. Presumably that would work well for apps here too?

ufechner7 commented 5 months ago

What is still needed before this can be merged?

kescobo commented 2 months ago

What is still needed before this can be merged?

In case folks haven't seen it - @KristofferC's talk from JuliaCon has a nice summary of the current status and what the open questions still are (or what they were as of a couple of weeks ago. Start at about 6:49:00 here: https://www.youtube.com/live/OQnHyHgs0Qo?si=IVg01oXigQw1JBDH&t=24545

tecosaur commented 2 months ago

It's great to see work on this continuing, I'd be interested to hear why the support for creating symlinks in the user's local bin-dir on Linux has been removed (requiring $PATH shenanigans) though? :confused:

KristofferC commented 2 months ago

why the support for creating symlinks in the user's local bin-dir on Linux has been removed (requiring $PATH shenanigans) though

I tried to scale off as much as possible that isn't strictly needed to focus my efforts. It can always be added back at a later stage.

tecosaur commented 2 months ago

I tried to scale off as much as possible that isn't strictly needed to focus my efforts.

Righteo. I do see making modification of $PATH a last-resort as rather important, given all the potential complications (incidentally there's a conversation that's just gone on in #hpc on Slack about problems with path modification with the module HPC application management system).

It can always be added back at a later stage.

If you would like any help doing so, I'd be happy to lend a hand.

KristofferC commented 2 months ago

I do see making modification of $PATH a last-resort as rather important, given all the potential complications (incidentally there's a conversation that's just gone on in #hpc on Slack about problems with path modification with the module HPC application management system).

That wast

I just realised switching between Python virtual environments (with venv) messes up with environment modules: when you deactivate a venv it restores the PATH at the time when environment was activated, but if in the meantime you loaded a module, then its PATH is gone

?. That doesn't seem to apply here, no? For example, I haven't heard people having had much issues with juliaup even though it doesn't install in .local.

tecosaur commented 2 months ago

That [wasn't]

To me the main takeaway from this conversation is fragility associated with path modification, which is the core of the module + python headache, and comes up a bunch in subsequent messages (e.g. "yeah, messing with PATH is so problematic" - Mose).

That doesn't seem to apply here, no? For example, I haven't heard people having had much issues with juliaup even though it doesn't install in .local.

Issues with the Juliaup approach do come up, a few from a quick search:

I believe Juliaup has started printing out instructions for users to modify the PATH themselves in some cases. That said, even if the automatic shell startup file modification occurs without an error, depending on the particular content of the shell file, blindly append content may not be run. Then we've also got the increasing popularity of more exotic shells...

All in all, this is a can of worms I'd want to keep closed as much as possible.

_Edit: just to mention for fun, the Juliaup approach doesn't work on my system either, but for different reasons again to those I've listed above :upside_downface:

JBlaschke commented 2 months ago

Hi Folks, I wanted to weigh in from the perspective of HPC.

If I understand this PR correctly, then the strategy chosen is to control the user environment in such a way that Julia code, Pkg environment, and default entrypoints emulate a user experience similar to a compiled executable.

This is like the approaches taken by Python zipfiles, anaconda, etc. Our experiences in running HPC systems (serving up to 10k users) so far has shown that this approach is:

Basically: we are developing HPC-native container runtimes precisely because the approach chosen in this PR performs poorly for Python. The irony here is that this considerable engineering effort is only necessary because Python can't generate compiled code.

Therefore, I think that the motivation behind this PR -- while well intentioned -- might run a real risk at being harmful to Julia as a High-Productivity HPC language. Especially because efforts to build executable applications appears to be within reach for Julia: https://github.com/JuliaLang/julia/pull/55047. Furthermore, since JIT compilation adds complexity to the container build process, this should prove to be a much more seamless and scalable solution to building Julia applications than Pkg Apps.

Also, I think an approach to Julia applications that is based on compiled executables (which could be placed in any reasonable location on the filesystem) would result in a better user experience (including for non-HPC users). When developing tools to be used by others, I have opted for compiled executables as they don't rely on the user's runtime environment. This PR implicitly promises to support every edge case which a user could configure into their favorite shell, so from a mere user support perspective I think a combination of https://github.com/JuliaLang/julia/pull/55047 + a distribution mechanism would be much easier to maintain.

Let me know what you think. I am happy to contribute some of my time to this.

Citing @Seelengrab , @giordano, and @tecosaur : we had a conversation on Slack that brought this to my attention (this does not imply that they share or endorse my opinion)

Seelengrab commented 2 months ago

Citing @vchuravy, @giordano, and @timholy: we had a conversation on Slack that brought this to my attention (this does not imply that they share or endorse my opinion)

I should point out that you were probably talking to me, not @vchuravy - different Valentin!

JBlaschke commented 2 months ago

Citing @vchuravy, @giordano, and @timholy: we had a conversation on Slack that brought this to my attention (this does not imply that they share or endorse my opinion)

I should point out that you were probably talking to me, not @vchuravy - different Valentin!

Ha! Let me fix the citation! Sorry for the mixup.

Seelengrab commented 2 months ago

And the Tim(othy) who chimed in later is @tecosaur, not @timholy :sweat_smile:

JBlaschke commented 2 months ago

And the Tim(othy) who chimed in later is @tecosaur, not @timholy 😅

Goddammit! I should stop with the late night posts (I hate having incomplete todos when going to bed)

JBlaschke commented 2 months ago

Let's hope there is no @giordano doppelganger also....

kescobo commented 2 months ago

If I understand this PR correctly, then the strategy chosen is to control the user environment in such a way that Julia code, Pkg environment, and default entrypoints emulate a user experience similar to a compiled executable.

This is like the approaches taken by Python zipfiles, anaconda, etc. Our experiences in running HPC systems...

First, let me say that I agree with many of the limitations that you mention, and as someone that also works on HPCs (though not at the same level), I'm glad to have people thinking about this stuff.

At the same time, I don't think we want to let the perfect be the enemy of the good here. Julia already has a lot of advantages over python when it comes to platform independence (eg binary builder), and this PR as it stands has functionality that will be extremely useful in many contexts, even if it's not perfect for HPCs at the moment, requires a bit of extra work for users to modify their own paths, etc.

I agree that we should not rely on Julia managing path stuff, but IIUC, this PR explicitly doesn't do that - it puts stuff in a julia-managed directory and relies on users to deal with it from there. This is how cargo does it too, and I've been able to use lots of those programs on my HPC.

I agree we want to move towards a place where Pkg can build binaries and seamlessly integrate them into the system environment, but I think that can be built on top of this, and I for one do not want to wait for that ideal state to get access to this functionality.

jpsamaroo commented 2 months ago

Another counter-point to https://github.com/JuliaLang/julia/pull/55047 as an alternative viable solution: not all Julia programs can be made free from dynamic dispatch, yet https://github.com/JuliaLang/julia/pull/55047 requires that for the programs it generates. In particular, if that were our only application deployment solution, then any program which uses Dagger.jl (which contains a dynamic-dispatch based core) would not be able to be deployed as an application, and thus users would be driven away from using Dagger in their applications if it meant that they lost application support by using Dagger. That would be a pretty harmful force to have exist within our ecosystem, as dynamic dispatch serves a very useful purpose, especially when used with care to ensure it doesn't result in unnecessary slowdowns in fast-paths.

To make things more concrete, could concerned HPC developers try out this PR on their system of interest and see where any issues arise? In particular, can we identify any currently-existing pain points in the current implementation that would make it hard for application/package authors to adopt this feature in their application? That would ground the discussion around things that we can clearly identify as issues to be resolved in some way, rather than trying to throw out this idea in its entirety on the basis of known unknowns.

JBlaschke commented 2 months ago

I think this is a really good discussion to have, so thank you @kescobo and @jpsamaroo for your comments.

The discussion involving creating symlinks (which do a number on shared file system metadata servers at scale) and $PATH shenanigans (which often run afoul of HPC env management systems like modules) worries me. Cf. @tecosaur's comments above. Anaconda does both, requiring substantial support efforts from the admins. This isn't alarmist: one of the systems I run on requires your environment to be untarred to /tmp on the computes!

My main pain points at the moment are:

  1. Loading the software environment at scale
  2. Relocating applications and all their dependencies

Cargo doesn't have these problems because rust builds statically linked (to a point) executables. So you can go and run cargo install, then sbcast the executable. You can't do the same for Julia. The generally accepted approach to launching many-file applications is containers. Julia has problems with HPC containers because of two reasons:

  1. If you don't manage JULIA_DEPOT_PATH, and JULIA_LOAD_PATH carefully, you might end up calling out to the parallel file systems anyway
  2. HPC containers are read-only. Which includes the precompile cache (and often you can't build containers on the same hardware as the computes)

Going forward I therefore recommend:

  1. Making Pkg Apps as self-contained as possible, and avoid making them rely on the runtime environment to load dependencies. If we avoid PATH shenanigans then that's a step in the right direction, but I also recommend that apps and their dependencies be relocatable (i.e. they get their own JULIA_DEPOT, or a way to easily traverse the manifest to "extract" any dependencies).
  2. Make precompile artefacts relocatable. I think https://github.com/JuliaLang/julia/pull/49866 probably solves that, and would just require more testing here: https://github.com/JuliaLang/julia/issues/53810
  3. Have the ability to generate multi-platform precompilation caches. I know this is vague, but I see a future where HPC systems have a range of GPU architectures. This is probably beyond the scope here. And we can work around this atm.

This would allow users to deploy Pkg Apps and relocate them if necessary (e.g. relocating them into a container) to deploy at scale. Testing this could be combined with https://github.com/JuliaLang/julia/issues/53810

The solution described in https://github.com/JuliaLang/julia/pull/55047 is a stronger form.

@jpsamaroo can you explain more. Surely a program that depends on Dagger.jl doesn't constantly update the precompile cache?

KristofferC commented 2 months ago

I think this discussion is really in the weeds.. When using this to install "Julia package apps", this feature is basically just a convenient way for a user to run foo my args in the terminal instead of having to install the Foo package in a separate environment and running julia --project=@Foo -e 'using Foo; Foo.main(ARGS)' my args.

All of the arguments made above apply the same to normal Julia packages (which is obvious since they are basically the same thing).

If you want to talk about static compilation of Julia and these types of things then this is not the right place. This is just a small installer that makes something available to you on the command line. How that thing is implemented is not determined or implemented here. If there are julia packages that can be statically compilable (juliac style) then those can be integrated into this installer.

JBlaschke commented 2 months ago

Stepping out of the weeds for a bit: @KristoffeeC if that is all, then that's probably fine. I am assuming Pkg Apps won't need:

  1. Modifications to the user's runtime environment (PATH and LD_LIBRARY_PATH, etc). Optional modifications are fine, I just need to be able to turn them off.
  2. Assumptions about where the JULIA_DEPOT lives (e.g. you won't start loading something like ~/.juliarc that just has to live in $HOME for some reason).

The thing I'm looking for is some sort of assurance that this will not start taking over the shell environment, or limit site customizations (the way anaconda does).

Really important is that Preferences still work the same way. E.g.: If I have a LocalPreferences.toml in a project higher on the JULIA_LOAD_PATH, do Apps still pick that up.

Seelengrab commented 2 months ago

Modifying PATH was one of the things Terence Tao recently pointed out as bothering him about the way Python does things :grimacing:

This is just a small installer that makes something available to you on the command line

I think people above are aware of that - is this the right place to discuss the design of that installer (e.g. that the installer should avoid requiring a modification of PATH for the installed apps to work)?

KristofferC commented 2 months ago

Optional modifications are fine, I just need to be able to turn them off.

The shims that start up the application (and do have to be in PATH) have to go somewhere. That exact location could be e.g. overridable.

Assumptions about where the JULIA_DEPOT lives (e.g. you won't start loading something like ~/.juliarc that just has to live in $HOME for some reason

It will install packages and modify environments etc to JULIA_DEPOT[1] just like when installing julia libraries.

Really important is that Preferences still work the same way. E.g.: If I have a LocalPreferences.toml in a project higher on the JULIA_LOAD_PATH, do Apps still pick that up.

I haven't looked into how stacked preferences work. They surely cannot be dependent on having e.g. the v#.# environment in your LOAD_PATH? That would be awful.


I think people above are aware of that

Comments like

Also, I think an approach to Julia applications that is based on compiled executables (which could be placed in any reasonable location on the filesystem) would result in a better user experience (including for non-HPC users). When developing tools to be used by others, I have opted for compiled executables as they don't rely on the user's runtime environment.

made me doubt that.

JBlaschke commented 2 months ago

The shims that start up the application (and do have to be in PATH) have to go somewhere. That exact location could be e.g. overridable.

As long as it's overridable / controllable by sysadmins that's fine then. We can worry about the mechanisms later (CUDA.jl and MPI.jl have been great at listening to what the HPC community needs, and tweaking their control surfaces accordingly).

It will install packages and modify environments etc to JULIA_DEPOT[1] just like when installing julia libraries.

We already control these, so that's great. I am hearing that we don't need to start rethinking Julia support in this regards.

They surely cannot be dependent on having e.g. the v#.# environment in your LOAD_PATH? That would be awful.

They don't. For example, this is what we do on Perlmutter:

[MPICH_jll] libmpi_path = "/opt/cray/pe/mpich/8.1.28/ofi/gnu/12.3/lib/libmpi.so"

[CUDA_Runtime_jll] local = "true" version = "12.2"


So the test here is to see if an Pkg App picks up on any stacked perferences.

> made me doubt that.

You where right to doubt -- my initial sense was that there is going to be a whole new kind of Julia workflow that we would have to support.