MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://astroautomata.com/PySR
Apache License 2.0
2.11k stars 198 forks source link

[BUG]: population increase kills kernel #330

Closed RubenBies closed 1 year ago

RubenBies commented 1 year ago

What happened?

As mentioned in the discussion https://github.com/MilesCranmer/PySR/discussions/327 I have a problem with my model, I assume its a CPU or RAM problem since I'm running on a i5 laptop with 15G RAM but in case you want to investigate: I was running a model on 10k samples with 2 features and tried to increase the populations to 50 when my kernel died. This also does not happen every time, wich is confusing, sometimes it runs just fine. There's no error, my terminal just prints:

Started! Killed

my model is: model_A = PySRRegressor( populations = 50 population_size = 33, niterations=60 ncyclesperiteration = 550, maxsize = 40, maxdepth = 8, loss = "L2DistLoss()", weight_optimize = 0.002, model_selection = 'best', parsimony = 0.004, binary_operators = ["+", "-", "*", "/"] unary_operators=[ "square", ], nested_constraints= {'square': {'square': 2}} complexity_of_variables= 2, warm_start = warm_start, batching = True turbo = True )

Edit: There was a pkl File created when the search started

Version

0.12.0

Operating System

Linux

Package Manager

Conda

Interface

Other (specify below)

Relevant log output

Started!
Killed

Extra Info

Interface: VSCode

MilesCranmer commented 1 year ago

Can you try:

  1. Running with turbo=False
  2. Running with batching=False
  3. Both 1 and 2

And let me know if this causes the issue to disappear? The turbo=True setting is experimental; sometimes such kernels can crash depending on your OS.

MilesCranmer commented 1 year ago

One other tricky thing (thinking out loud) is that the constant optimization is full batch. ie it will tune constants over the entire 10k samples. I want to fix this, but so far haven’t gotten around to it. I wonder if this is where the issue comes from.

MilesCranmer commented 1 year ago

Relevant code here: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/2749e67a5d9c3e715b43811e0b6f9c857198289d/src/ConstantOptimization.jl#L17

should be a check for options.batching and using the batched loss if true. Would need a fixed seed throughout training (might be worth it to create a batch dataset).

RubenBies commented 1 year ago

You were right, turning off turbo seemed to do the trick for me! I will experiment a bit with even more populations and get back to you if it happens again. Its still weird, that it only happened every other time and the only reason I didn't think of this before is that I thought turbo will just result in an error, if it doesn't work. (my OS is Ubuntu 20.04 in case this clears it up) Batching did not seem to have an impact.

Thanks for your Help!

MilesCranmer commented 1 year ago

Thanks for following up, that is very useful info. It’s too bad sometimes turbo=True fails; I would otherwise love to have it as the default option (but it was a good decision to leave it off by default).

It would probably be useful to the LoopVectorization.jl developers if you could post this issue there (https://github.com/JuliaSIMD/LoopVectorization.jl), and that you found using @turbo loops in PySR crashes intermittently. It may help them patch some part of the library. Feel free to tag me too. It’s particularly surprising/worrying that this happens on Ubuntu; normally you only would get these on Windows.