Open PatrickHaecker opened 3 months ago
Would be also amazing if those precompiled modules could be saved as shared libraries locally to the files/scripts/dev packages as Manifest.toml
currently does. So whenever someone starts a script or developing a package those precompiled modules would be picked up automatically.
This is kind of already here, I think. It's possible to track all compilation, saving precompile calls to a file with --trace-compile=file_name
. After that just put the precompile statements into a package and use the package from your startup.jl. Perhaps some of this should be further automated?
@nsajko
That sounds like a lot of work for a regular user, especially for those who are not computer scientists or have a poor understanding of how compilers work. Additionally, this setup might not work well with Revise
.
Perhaps some of this should be further automated?
Thats how I read the initial proposal yes, would be nice to automate all of this, making the precompile statements run behind the scenes and enabling the ability to dump and reload the precompiled modules between Julia sessions.
This is kind of already here, I think. It's possible to track all compilation, saving precompile calls to a file with
--trace-compile=file_name
. After that just put the precompile statements into a package and use the package from your startup.jl. Perhaps some of this should be further automated?
Thanks, @nsajko , indeed I am currently following a similar, but even more complicated manual workflow (due to supporting both PackageCompiler and a "development build" and needing to investigate which precompiles were triggered by which module). I was thinking towards a better solution and this proposal is what I came up with from a user perspective. I would probably even set an alias on my system to have this parameter activated by default, because that's the behavior what I want nearly all the time.
I also assume that we have most of the major building blocks already. As far as I know only the "which module was responsible for this runtime precompile" is missing a user interface and this sounds like a low hanging fruit. I guess it's only a matter of putting the blocks together, but I do not know about the implementation details of these blocks. But to me that sounds like the best of both worlds (classical AoT compiled languages and interpreted/JIT/JAoT compiled languages).
Would be also amazing if those precompiled modules could be saved as shared libraries locally to the files/scripts/dev packages as
Manifest.toml
currently does. So whenever someone starts a script or developing a package those precompiled modules would be picked up automatically.
I agree that this would be cool when it works, but would it work so often? At least the native code would fail whenever someone has a different computer architecture / instruction set. Optimizations are also sometimes very CPU-specific (e.g. AES-512). So this, although interesting, opens up a whole lot of additional questions. Therefore, I propose to have a separate issue for that question, if you want to follow up on this proposal.
but would it work so often?
I think so. In my opinion, it would cover the vast majority of cases since most people use the same laptop or computer for months/years. For example, if I worked on experiments yesterday, I could restart the Julia session today with all the functions precompiled from yesterday. That's the use case for 99% of users. Currently, most of the cache is lost between sessions (although packages do precompile some stuff, the majority of session-specific compilations are wiped). What I was trying to propose is basically equivalent to not closing Julia terminal session over night and keeping it alive for days. Would be nice to just dump all the available precompiled cache in a binary file and restart julia session with this binary file later on such that it feels you never actually closed your terminal.
You are correct that this wouldn't work on a different computer architecture. However, I believe your original proposal would face the same issue, as precompiled modules wouldn't work between different computer architectures anyway. I may have misunderstood your proposal, though.
JIT compilation and generating compilation output that can be reused in another session are fundamentally different. JIT compilation can avoid some indirection since it can assume the locations of procedures will not change. Whereas compilation that can be reloaded requires some indirection.
Julia's compilation modes can be affected by the options below, but they are not meant for the end user. The closest end user implementation to what you want are the pkgimages, which uses the options below internally.
I'm unclear if you have tried pkgimages and PrecompileTools.jl and how they may or may not be applicable to your problem.
$ julia --help-hidden
julia [switches] -- [programfile] [args...]
Switches (a '*' marks the default value, if applicable):
--compile={yes*|no|all|min}
Enable or disable JIT compiler, or request exhaustive or minimal compilation
--output-o <name> Generate an object file (including system image data)
--output-ji <name> Generate a system image data file (.ji)
--strip-metadata Remove docstrings and source location info from system image
--strip-ir Remove IR (intermediate representation) of compiled functions
--output-unopt-bc <name> Generate unoptimized LLVM bitcode (.bc)
--output-bc <name> Generate LLVM bitcode (.bc)
--output-asm <name> Generate an assembly file (.s)
--output-incremental={yes|no*}
Generate an incremental output file (rather than complete)
--trace-compile={stderr,name}
Print precompile statements for methods compiled during execution or save to a path
--image-codegen Force generate code in imaging mode
--permalloc-pkgimg={yes|no*} Copy the data section of package images into memory
Would be nice to just dump all the available precompiled cache in a binary file and restart julia session with this binary file later on such that it feels you never actually closed your terminal.
The use case of the same computer should already be covered in my proposal. I thought you wanted to extend it, but I think we have the same use case in our mind.
Relaying the thought from the related Discourse thread, have you considered using PrecompileTools.jl with a Startup package?
https://julialang.github.io/PrecompileTools.jl/stable/#Tutorial:-local-%22Startup%22-packages
JIT compilation can avoid some indirection since it can assume the locations of procedures will not change. Whereas compilation that can be reloaded requires some indirection.
Thanks for the explanations. So if I got it, then the feature request really is "if the flag is provided, generate relocatable JIT code which is then saved to the image file if it is not in there already".
The idea is to make use of pkgimages to basically achieve a similar, but faster and more comfortable effect than using PrecompileTools.jl.
I am not sure which of these command line arguments might support the described use case, so I tested them separately and commented them so that we can see whether my understanding is correct. Probably I need at least a combination of them, but I am unclear which one.
--compile={yes*|no|all|min} Seems to be orthogonal
--output-o <name> Results in "ERROR: File "boot.jl" not found"
--output-ji <name> Results in "ERROR: File "boot.jl" not found"
--strip-metadata Seems to be orthogonal
--strip-ir Seems to be orthogonal
--output-unopt-bc <name> Seems to be orthogonal
--output-bc <name> Seems to be orthogonal
--output-asm <name> Seems to be orthogonal
--output-incremental={yes|no*} This works, but I am not sure what it does without any other options, but it does not seem to save runtime precompiles per se.
--trace-compile={stderr,name} This might be a building block, but as long as it does not output the calling module, I am not sure how much it will help.
--image-codegen I am not sure what it does without any other options, but it does not seem to save runtime precompiles per se.
--permalloc-pkgimg={yes|no*} This sounds unreleated
I have the feeling, that I did not state very well, what I want to have as a solution. I tried to improve the wording, but please give me hints what would help you to help me. :-)
Relaying the thought from the related Discourse thread, have you considered using PrecompileTools.jl with a Startup package?
https://julialang.github.io/PrecompileTools.jl/stable/#Tutorial:-local-%22Startup%22-packages
Thanks for the hint, I guess you are referring to this thread. Ideally I do not want to have to setup nor to maintain anything as this should really support a developing workflow where things change. If a function gets precompiled outside of a module precompilation run, it should just be saved by Julia for next time without any function-specific configuration and without an additional run (as with a workset with PrecompileTools.jl
.
The idea is to make use of pkgimages to basically achieve a similar, but faster and more comfortable effect than using PrecompileTools.jl.
What is uncomfortable about PrecompileTools.jl? You technically do not need it, but it makes compiling modular relocatable code a lot more pleasant.
You could simply run code at the package module top-level and it will be saved into the pkgimage. However, it will run every single time you try to load the code. We can strip the functionality of PrecompileTools.jl down to this single if
statement.
if ccall(:jl_generating_output, Cint, () == 1
# code to compile to disk
end
You also need to consider the situations of when the serialized code is valid or not. This is the issue that pkgimages solves.
Perhaps the issue is that you have a script and not a package. If so look into https://github.com/jolin-io/JuliaScript.jl which automates the creation of a package for a script.
If you want to understand the output options, you may want to study how PackageCompiler.jl works:
https://github.com/JuliaLang/PackageCompiler.jl/blob/master/src%2FPackageCompiler.jl#L462-L465
cmd = `$(get_julia_cmd()) --cpu-target=$cpu_target $sysimage_build_args
--sysimage=$base_sysimage --project=$project --output-o=$(object_file)
$outputo_file`
@debug "running $cmd"
Thanks for all the hints, @mkitti .
What is uncomfortable about PrecompileTools.jl? You technically do not need it, but it makes compiling modular relocatable code a lot more pleasant.
The workflow with PrecompileTools.jl
looks like this:
PrecompileTools.jl
In comparison the proposed workflow looks like this:
julia --append-precompiles
and it just works.You could simply run code at the package module top-level and it will be saved into the pkgimage
Yes, for most of the cases this works. However, there are cases where this does not work (I hit some of these cases, too), see e.g. here or there.
Perhaps the issue is that you have a script and not a package. If so look into https://github.com/jolin-io/JuliaScript.jl which automates the creation of a package for a script.
I am using packages. Thanks a lot for pointing me towards JuliaScript.jl
. I watched the JuliaCon talk about it and it looks interesting. However, as the first compilation run takes more time, this is at least currently not an improvement for code which changes frequently.
There should be a command line parameter called
--append-precompiles
or--save-compile
or something like this which enables to save all precompiles which happened during runtime (either in parallel at runtime or when the program finishes). The precompiles should probably be saved in the module image where the precompile was triggered from (the caller of the function).This way, the precompilation image file contains both the precompiles from the precompile pass and the precompiles which came up during runtime (possibly the result of multiple runs (as the union set operation) if new calls come up in different runs). If the module is changed, all precompiles would get invalidated as usual. Afterwards, the number of precompiles grows first with the first precompile run and then possibly with each run (although most modules will not grow anymore at all after precompilation or only for the first run).
I am not sure how the REPL comes into place for this features. Until there are good ideas it might be best to not have this feature (saving new precompiles) in an interactive session.
With this mechanism in place I could even imagine that the regular precompile run might not even be intended for a lot of use cases. Just run the code and whenever some method is called the first time, precompile it, save the precompilation and go on. That means after a code change the functions are precompiled iff they are actually used, i.e. no currently unused function is precompiled and no function is precompiled more than once (even when accounting for multiple program runs; as long as the involved code is not modified).