Distributed PySR not working on PBS cluster

praksovar commented 1 year ago

Hello Miles, thank you for you excelent work. I am interested in searching SR models in many fields of my research and in the past I used Eureqa (https://link.springer.com/article/10.1007/s10710-010-9124-z). Recently, I use your PySR (e.g. https://www.mdpi.com/2075-1680/11/9/463). I read your tips for the running PySR on the cluster and I tried it also on our cluster Barbora (https://www.it4i.cz/en) BUT there are many errors:

Permission denied for: .julia/environments/v1.8/Project.toml - solved by export JULIA_DEPOT_PATH,JULIA_PROJECT and JULIA_LOAD_PATH to the local dir in scratch
I couldn't run: pysr.install() from Python - solved by installing manualy directly in Julia by import Pkg; Pkg.add("SymbolicRegression")
Now, I got this error which I couldn't resolve (I don't have any experiencies with Julia):

Error launching workers ErrorException("") Activating environment on workers. Importing installed module on workers...Finished! Testing module on workers...Finished! Testing entire pipeline on workers...Finished! Traceback (most recent call last): File "/home/myname/myscript.py", line 49, in model.fit(X, y) File "/apps/all/Python/3.10.8-GCCcore-12.2.0/lib/python3.10/site-packages/pysr/sr.py", line 1834, in fit self._run(X, y, mutated_params, weights=weights, seed=seed) File "/apps/all/Python/3.10.8-GCCcore-12.2.0/lib/python3.10/site-packages/pysr/sr.py", line 1694, in _run self.raw_juliastate = SymbolicRegression.EquationSearch( RuntimeError: <PyCall.jlwrap (in a Julia function called from Python) JULIA: MethodError: reducing over an empty collection is not allowed; consider supplying init to the reducer

Please, do you have any tips to solve this error? What am I doing wrong? Thank you in advance! Best regards, Renata

Version: OS: Red Hat Enterprise Linux 8.4 (Ootpa) Julia 1.8.5 Python 3.10.8 PySR 0.12.1

I used this option settings - inspired by your advice: model = PySRRegressor( niterations=500000, population_size=108, binary_operators=["+", "*","/","^","-"], unary_operators=["abs","cos","log","exp","sin"], loss='L1DistLoss()', procs=36,cluster_manager='pbs', ncyclesperiteration=5000,turbo=True, maxdepth=7,parsimony=0.0001,weight_optimize=0.001,adaptive_parsimony_scaling=1000, nested_constraints={"sin": {"sin": 0, "cos": 0}, "cos": {"sin": 0, "cos": 0}} )

MilesCranmer commented 1 year ago

Hi @praksovar,

Everything looks good to me in your options.

Can you share the full error message? If it is long perhaps you could put it in a gist.github.com?
Is procs=36 the number of cores over your entire allocation? Or is it the number of cores per node? (It should be # of cores over entire allocation. i.e., num_nodes * num_cores_per_node).
How are you launching this script - from the head node, or once per node? (It should just be launched from the head node; Julia will be able to create workers across the allocation)

Cheers, Miles

MilesCranmer commented 1 year ago

Hi @praksovar,

Just wanted to ping you on this. Please provide more details if possible so I can help fix it.

Cheers, Miles

praksovar commented 1 year ago

Hi Miles, Thank you for you reply. The error which I asked you previously was solved by our support. The code is running correctly BUT only on 15-16 cores from the total 36 cores, 50% loads. I am using one node with 36 cores (ncpus=36).

The settings are as follows: model = PySRRegressor( niterations=50000, population_size=216, binary_operators=["+", "*","/","^","-"], unary_operators=["exp", "log",'abs'], loss='L1DistLoss()', multithreading=True, procs=36,cluster_manager="pbs",ncyclesperiteration=5000,turbo=True, maxdepth=7,parsimony=0.0001,weight_optimize=0.001,adaptive_parsimony_scaling=1000 )

So, I used your example which I ran in Python with PySR and also in Julia with SymbolicRegression.jl on our cluster. I found that Julia runs 36 cores whereas Python only 15-16 cores. Python: X = np.random.random((5, 100)) y = 2 * cos(X[4, :]) + X[1, :]* 2- 2 model = PySRRegressor(binary_operators=["+", "","/","^","-"],unary_operators=["cos", "exp"],population_size=540,niterations=400,ncyclesperiteration=5000,turbo=True, multithreading=True) model.fit(X.T,y)

Julia: X = randn(Float32, 5, 100) y = 2 cos.(X[4, :]) + X[1, :] .^ 2 .- 2 options = SymbolicRegression.Options( binary_operators=[+, , /, -], unary_operators=[cos, exp], npopulations=540,ncyclesperiteration=5000,turbo=true) hall_of_fame = EquationSearch( X, y, niterations=40, options=options, parallelism=:multithreading )

Do you have any idea why? Thank you. Cheers, Renata

MilesCranmer commented 1 year ago

Hi @praksovar,

Sorry for the late reply. The issue is that you are using multithreading=True. You need to have multithreading=False for multiprocessing mode to be enabled.

Likewise in the pure Julia mode, you need to use parallelism=:multiprocessing, addprocs_function=addprocs_pbs, rather than parallelism=:multithreading.

Cheers, Miles

MilesCranmer / PySR

Distributed PySR not working on PBS cluster #289