Open MilesCranmer opened 11 months ago
@mkitti FYI on Python 3.12 it almost looks like the Python and Julia garbage collection are competing with eachother to free the same memory...? Check out this weird error that references both python and julia GC:
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffb0691a026 -- PyObject_Free at C:\hostedtoolcache\windows\Python\3.12.1\x64\python312.dll (unknown line)
in expression starting at none:0
PyObject_Free at C:\hostedtoolcache\windows\Python\3.12.1\x64\python312.dll (unknown line)
pydecref_ at C:\Users\runneradmin\.julia\packages\PyCall\1gn3u\src\PyCall.jl:118
pydecref at C:\Users\runneradmin\.julia\packages\PyCall\1gn3u\src\PyCall.jl:123
jfptr_pydecref_1039 at C:\Users\runneradmin\.julia\compiled\v1.9\PyCall\GkzkC_As42O.dll (unknown line)
run_finalizer at C:/workdir/src\gc.c:417
jl_gc_run_finalizers_in_list at C:/workdir/src\gc.c:507
run_finalizers at C:/workdir/src\gc.c:553
run_finalizers at C:/workdir/src\gc.c:534 [inlined]
ijl_gc_collect at C:/workdir/src\gc.c:3732
maybe_collect at C:/workdir/src\gc.c:1083 [inlined]
jl_gc_pool_alloc_inner at C:/workdir/src\gc.c:1450 [inlined]
jl_gc_pool_alloc_noinline at C:/workdir/src\gc.c:1511
jl_gc_alloc_ at C:/workdir/src\julia_internal.h:460 [inlined]
_new_array_ at C:/workdir/src\array.c:144
_new_array at C:/workdir/src\array.c:198 [inlined]
ijl_alloc_array_1d at C:/workdir/src\array.c:436
Array at .\boot.jl:477 [inlined]
Array at .\boot.jl:486 [inlined]
similar at .\array.jl:374 [inlined]
similar at .\abstractarray.jl:839 [inlined]
deg2_l0_r0_eval at C:\Users\runneradmin\.julia\packages\DynamicExpressions\KRT17\src\EvaluateEquation.jl:257
jfptr_deg2_l0_r0_eval_1577 at C:\Users\runneradmin\.julia\compiled\v1.9\DynamicExpressions\BQC8W_As42O.dll (unknown line)
_eval_tree_array at C:\Users\runneradmin\.julia\packages\DynamicExpressions\KRT17\src\EvaluateEquation.jl:117
_eval_tree_array at C:\Users\runneradmin\.julia\packages\DynamicExpressions\KRT17\src\EvaluateEquation.jl:131
_eval_tree_array at C:\Users\runneradmin\.julia\packages\DynamicExpressions\KRT17\src\EvaluateEquation.jl:125
#eval_tree_array#1 at C:\Users\runneradmin\.julia\packages\DynamicExpressions\KRT17\src\EvaluateEquation.jl:65 [inlined]
eval_tree_array at C:\Users\runneradmin\.julia\packages\DynamicExpressions\KRT17\src\EvaluateEquation.jl:59 [inlined]
#eval_tree_array#1 at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\InterfaceDynamicExpressions.jl:57 [inlined]
eval_tree_array at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\InterfaceDynamicExpressions.jl:56 [inlined]
_eval_loss at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\LossFunctions.jl:48
#eval_loss#3 at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\LossFunctions.jl:101
eval_loss at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\LossFunctions.jl:93 [inlined]
#score_func#5 at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\LossFunctions.jl:160 [inlined]
score_func at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\LossFunctions.jl:157 [inlined]
#next_generation#1 at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\Mutate.jl:235
next_generation at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\Mutate.jl:60 [inlined]
reg_evol_cycle at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\RegularizedEvolution.jl:37
#s_r_cycle#1 at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\SingleIteration.jl:42
s_r_cycle at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\SingleIteration.jl:17 [inlined]
#_dispatch_s_r_cycle#81 at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\SymbolicRegression.jl:1053
_dispatch_s_r_cycle at C:\Users\runneradmin\.julia\packages\SymbolicRegression\OYvt5\src\SymbolicRegression.jl:1036
It looks like Julia garbage collection actually asking Python to free something that should not be freed.
What I do not understand is how is this getting triggered from here: https://github.com/SymbolicML/DynamicExpressions.jl/blob/8109f9c93c877d89274a9b1b5a6a6b19bf4e4e02/src/EvaluateEquation.jl#L257C1-L258C1
It seems to come from different places in the code each time. So I think it’s basically random points (this part of the code is one of the most frequently hit during the search), during which the GC starts to free some memory, and ends up trying to free something that has already been freed...
I got some advice in https://github.com/python/cpython/issues/113591 on this. Currently trying to run it in valgrind but it’s quite slow so it could be a while (ran for 4 hours already and it hasn’t even started the search)
Weird. If I turn off multithreading in the search, the segfault goes away...
Seems like Python 3.12 is stable enough to add to the testing now.
Once https://github.com/conda-forge/pysr-feedstock/pull/106 is finished we should be able to merge this.