JuliaPy / PythonCall.jl

Python and Julia in harmony.
https://juliapy.github.io/PythonCall.jl/stable/
MIT License
810 stars 64 forks source link

Lazy load CondaPkg (and Pkg), I think LazyModules for that, or better option? #549

Open PallHaraldsson opened 2 months ago

PallHaraldsson commented 2 months ago

At least when I set: ENV["JULIA_CONDAPKG_BACKEND"] = "Null"

I get: ERROR: InitError: CondaPkg is using the Null backend but Python is not installed

though, so not investigating further (I have Python installed).

With this one loads but not any faster (but should too):

ENV["JULIA_PYTHONCALL_EXE"] = "@PyCall"  # optional

julia> ENV["JULIA_PYTHONCALL_EXE"] ="/usr/local/bin/python3";

julia> @time using PythonCall
┌ Warning: Python library "/usr/local/lib/python3.12/config-3.12-x86_64-linux-gnu/libpython3.12.a" could not be opened.
└ @ PythonCall.C ~/.julia/dev/PythonCall/src/C/context.jl:122
┌ Warning: Python library "/usr/local/lib/libpython3.12.a" could not be opened.
└ @ PythonCall.C ~/.julia/dev/PythonCall/src/C/context.jl:122
ERROR: InitError: Could not find Python library for Python executable "/usr/local/bin/python3".

If you know where the library is, set environment variable 'JULIA_PYTHONCALL_LIB' to its path.
PallHaraldsson commented 2 months ago

This is probably the best option, CondaPkg, loads very slowly, and even slower than I though, when not in the REPL, but in the REPL CAN be very fast. The problem is Pkg, and it's probably very hard for you to load it lazily, rather load CondaPkg lazyly, to not have to deal with the Pkg complexity.

See here: https://github.com/JuliaLang/julia/pull/55721#issuecomment-2337805245

PallHaraldsson commented 2 months ago

@cjdoris I'm considering, instead on lazy loading CondaPgk, to copy it over and include it into PythonCall.

It's sort of trivial to do, yes code duplication, not usually wanted, but I was thinking then I could more easily trim it down to the bare minimum.

Why? What's the benefit? Then CondaPkg (only) could just handle the Pkg dependency and its slowness... and that's ok/tolerable for something only needed for REPL/"UI package".

It would in effect not be a breaking change for PythonCall users, at least for scripts, only no longer adding to Pkg's REPL mode. I've given up on making it fast again (you changing it, it's probably possible, but seems like a lot of maybe not wanted work for Julia). So it would mean if you want that mode you could need to to: using [PythonCall,] CondaPkg. Would that be what you had in mind for a slightly braking change, and you want to do this yourself anyway (this way), or want me to look into it?

How do you like that idea? And CondaPkg would rather depend on PythonCall for the shared code (or it could go to a third place).

[While JSON3 parsing is super fast at runtime I believe, it's a bit slow to load, and then CondaPkg could rather use Python's JSON parsing... I kind of wanted that optimization for loading speed, and then it would need such an arrangement, but if decopled then I wouldn't bother with optimizing CondaPkg load speed. It is for sure not speed critical and basically only a UI package?

I think CondPkg isn't just used for Python, but Conda/mamba in general, e.g. for R, so then it would be a bit strange to rely on Pythn too, just for JSON parsing...]

PallHaraldsson commented 2 months ago

FYI: I'm down to (still) a 2.6x regression from 1.10, down from 116x if I recall):

$ hyperfine "julia +1.11 --trace-compile=stderr -e 'using CondaPkg'"
Benchmark 1: julia +1.11 --trace-compile=stderr -e 'using CondaPkg'
  Time (mean ± σ):     973.2 ms ±  26.6 ms    [User: 1707.3 ms, System: 161.5 ms]
  Range (min … max):   928.4 ms … 1020.2 ms    10 runs

0.9 (nor 0.4) sec isn't good, but tolerable, and the overhead is still coming down. The missing precompiles should help further (though they are almost the same missing also in 1.10, so unclear why 1.11 still has a regression):

precompile(Tuple{typeof(Base.setindex!), Base.EnvDict, Bool, String}) precompile(Tuple{typeof(micromamba_jll.find_artifact_dir)}) precompile(Tuple{typeof(Base.invokelatest), Any}) precompile(Tuple{typeof(JLLWrappers.get_julia_libpaths)}) precompile(Tuple{typeof(Base.getproperty), Markdown.MD, Symbol}) precompile(Tuple{typeof(Base.copy!), Array{Any, 1}, Array{Any, 1}})

This 23x regression in 1.11 is the main culprit seemingly (and because of the 338x regression for Pkg, I though I had fixed), so it might be enough to fix that, or lazy load it by CondaPkg, and then possibly keep the Pkg REPL mode even:

julia> @time using micromamba_jll
  0.923141 seconds (571.17 k allocations: 35.764 MiB, 9.88% gc time, 1.09% compilation time)

julia> @time_imports using micromamba_jll
      0.7 ms  Printf
     32.9 ms  Dates
      0.6 ms  TOML
               ┌ 0.0 ms NetworkOptions.__init__() 
    272.9 ms  NetworkOptions 98.54% compilation time
               ┌ 0.4 ms MbedTLS_jll.__init__() 
      4.3 ms  MbedTLS_jll
               ┌ 0.2 ms LibSSH2_jll.__init__() 
      3.6 ms  LibSSH2_jll
               ┌ 0.3 ms LibGit2_jll.__init__() 
      3.7 ms  LibGit2_jll
     12.0 ms  LibGit2
     16.1 ms  ArgTools
               ┌ 0.1 ms nghttp2_jll.__init__() 
      3.8 ms  nghttp2_jll
               ┌ 0.2 ms LibCURL_jll.__init__() 
      3.4 ms  LibCURL_jll
               ┌ 0.0 ms MozillaCACerts_jll.__init__() 
      3.7 ms  MozillaCACerts_jll
               ┌ 0.0 ms LibCURL.__init__() 
      1.6 ms  LibCURL
               ┌ 0.2 ms Downloads.Curl.__init__() 
     29.3 ms  Downloads
      1.2 ms  Tar
               ┌ 0.1 ms p7zip_jll.__init__() 
      5.4 ms  p7zip_jll
      0.2 ms  UUIDs
      0.1 ms  Logging
               ┌ 0.0 ms Pkg.__init__() 
    428.4 ms  Pkg
      0.2 ms  LazyArtifacts
     10.9 ms  Preferences
      0.5 ms  JLLWrappers
               ┌ 7.4 ms micromamba_jll.__init__() 67.03% compilation time
      7.8 ms  micromamba_jll 63.65% compilation time
               ┌ 1.3 ms REPLExt.__init__() 
    182.3 ms  Pkg → REPLExt

julia> @time using Pkg
  0.922609 seconds (553.10 k allocations: 34.669 MiB, 9.96% gc time, 0.64% compilation time)
cjdoris commented 2 months ago

I'm not in favour of code duplication - it is rarely the answer and makes maintenance a nightmare. I'm confident we can fix the slow-downs in CondaPkg directly.

cjdoris commented 2 months ago

In CondaPkg, Pkg is only used in two places: (a) the Pkg REPL mode and (b) in the slow path of resolve() (i.e. we cannot skip resolving).

(a) Can probably be solved with some combination of more precompilation, moving the code into an extension, and only doing some bits in interactive mode.

(b) Can probably be solved by only loading Pkg in the slow path (which is already slow so we don't care about any more slowdown).

PallHaraldsson commented 2 months ago

I meant moving, by duplicating/copying.. (then later removal in CondaPkg, but I was only, for now, offering exploring moving/duplicating into PythonCall, to see if viable; though if you disagree I wouldn't bother trying if you do not like the code ending up there).

Moving implies (copy and) deleting at the source (at least eventually), and somebody need to do the deleting, yes, just not a priority for me. :) It shouldn't be deferred for long.

I basically brought this up to see if you want PythonCall to load faster (somehow), and if you want me to look into at least doing part of the work. Lazy loading CondaPkg only doesn't seem to work because it's fully deferred until you lazily load it and it seems you need some of the code right away why it needs copying I think. Because importing is all or nothing.

cjdoris commented 2 months ago

I don't think there's any code in CondaPkg that can be removed though - it's all needed for CondaPkg to run correctly. And remember that CondaPkg does not purely exist for PythonCall, it is useful in its own right, otherwise CondaPkg would just be a part of PythonCall (which it was a very long time ago).