Open zzccchen opened 4 days ago
Can you try with turbo=False, bumper=False
? Those options are experimental and get PySR to use libraries which are bleeding edge. When they work, they are really fast, but they can also cause crashes (especially on Windows).
Regrettably. I tried turbo=False, bumper=False
parameter and the crash problem still occurred.
Could automatically setting --heap-size-hint=2730M
cause this problem?
Hm, Can you show the rest of your code?
from pysr import PySRRegressor
# data load code
X_123e = data_X_123e.to_numpy()
y_123e = data_y_123e.to_numpy()
sr_model = PySRRegressor(
binary_operators=[
"*",
"+",
"-",
"/",
],
unary_operators=["square", "cube", "exp", "log", "sqrt"],
maxsize=80,
maxdepth=10,
niterations=100,
populations=32,
population_size=100,
ncycles_per_iteration=550,
constraints={
"/": (-1, 9),
"^": (-1, 5),
"exp": 6,
"square": 6,
"cube": 6,
"log": 6,
"sqrt": 6,
"abs": 9,
},
nested_constraints={
"square": {"square": 0, "cube": 0, "exp": 1},
"cube": {"square": 0, "cube": 0, "exp": 1},
"exp": {"square": 0, "cube": 0, "exp": 0},
"sqrt": {"sqrt": 0, "log": 0},
"log": {"log": 0},
},
complexity_of_operators={
"square": 2,
"cube": 3,
"exp": 3,
"log": 3,
"sqrt": 2,
},
complexity_of_constants=4,
adaptive_parsimony_scaling=150.0,
weight_add_node=0.79,
weight_insert_node=5.1,
weight_delete_node=1.7,
weight_do_nothing=0.21,
weight_mutate_constant=0.048,
weight_mutate_operator=0.47,
weight_swap_operands=0.1,
weight_randomize=0.23,
weight_simplify=0.5,
weight_optimize=0.5,
crossover_probability=0.066,
perturbation_factor=0.076,
cluster_manager=None,
precision=32,
turbo=True,
bumper=True,
progress=True,
elementwise_loss="""
function loss_fnc(prediction, target)
percentage_error = abs((prediction - target) / target) * 100
return percentage_error
end
""",
multithreading=False,
equation_file=symbol_regression_csv_path,
)
complexity_of_variables = [] # list of complexity
sr_model.fit(
X_123e, y_123e, complexity_of_variables=complexity_of_variables
)
here is the main code of the workflow.
At the same time, I will put the above code in a multi-layer loop to test different feature data sets and the stability of the symbolic regression results. A single loop takes about 2.2 minutes. The program crashes after running for 3-4 hours, running about 80-110 rounds.
That looks good. Great to see all those options being used! 🙂
(Random comment: your element wise loss divides by the target, so make sure the target > 0, otherwise one target will dominate. But I’m assuming you’re aware of that!)
Other comment: can you try with multithreading=True
? With it set to False
, and with procs>0
(the default), it will use multiple Julia processes. But if you just use multi-threading instead, it will start up much faster and hopefully be more stable. With multi-processing it is launching new Julia processes every single time it searches. (This is a weakness in the current codebase; I would like to eventually store the processes within PySRRegressor so multiprocessing has fast startup too.)
You can also set multithreading=False, procs=0
to use serial mode.
But it’s curious that it crashes. Since it runs for a few hours, did you notice anything else happening, like the memory usage gradually increasing over that time and not going down?
If I use multithreading instead of multiprocessing, the calculation speed will drop from 30it/s to 7it/s on my device, which is a bit unacceptable to me. In addition, I have made sure that my y_true values are all greater than 0. And the memory usage does not fluctuate when the program crashes, occupying only 30% of the total memory.
Maybe try multithreading=True
again, but this time, before loading PySR, set a larger thread count:
import os
os.environ["PYTHON_JULIACALL_THREADS"] = (num_cores) * 2
Where num_cores
is the number of CPU cores. The factor of 2 is so there’s some redundancy but you could try more or less depending on performance.
The default behavior of PySR is to start Julia with --threads='auto'
which is actually fewer than the number of available cores (so it doesn’t take up the whole CPU). But for high performance you can increase the usage.
The full list of available juliacall environment variables is here: https://juliapy.github.io/PythonCall.jl/stable/juliacall/#julia-config
What happened?
The program crashed while using PySR, with an error message indicating a memory access violation (EXCEPTION_ACCESS_VIOLATION). This error occurred during the garbage collection process.
Version
v0.19.0
Operating System
Windows
Package Manager
pip
Interface
Script (i.e.,
python my_script.py
)Relevant log output
Extra Info
turbo=True, bumper=True