MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://ai.damtp.cam.ac.uk/pysr
Apache License 2.0
2.44k stars 217 forks source link

[Feature] Precompilation #84

Closed MilesCranmer closed 2 years ago

MilesCranmer commented 3 years ago

I think it would be great to precompile some parts of SymbolicRegression.jl to reduce the startup time of PySR. I think this could help startup time quite significantly.

Tutorial: https://julialang.org/blog/2021/01/precompile_tutorial/

MilesCranmer commented 2 years ago

Another thing to try would be PyJulia. Then much of the Julia backend would get cached, if Python stays open between commands.

The main reason I haven't used PyJulia so far is because of installation issues I've personally experienced (which therefore would likely be experienced by many users who have never used Julia). Another reason is I'm not sure how it would handle distributed computing - where it seems better to launch Julia from the command line normally (which is how PySR works).

MilesCranmer commented 2 years ago

Update. This seems to be working. I have the current draft version in the pyjulia branch. This should mean you can get faster startup time on second call, since you don't need to recompile the Julia backend every single launch - it will just cache the SymbolicRegression.jl from the previous pysr call.

Edit: confirmed that there is a much faster startup time.

MilesCranmer commented 2 years ago

PyJulia is working extremely well, even with distributed computing(!). While PyJulia doesn't even officially support this, it seems to work because the backend handles all distributed processes internally.

I will likely switch the entirety of PySR to PyJulia in a future version. In addition to the reduced startup time from repeat searches, another major advantage is I can finally have state-saving abilities, and store the equations directly in a Python object rather than in a csv file.

MilesCranmer commented 2 years ago

This is fixed with v0.7+