Closed TadeuNP closed 6 months ago
Related to the third error I am seeing in the Windows tests: https://github.com/MilesCranmer/PySR/issues/238 (also posted here: https://github.com/JuliaLang/julia/issues/47957). I unfortunately don't have a Windows machine where I can replicate this so it's a bit difficult for me to debug, but I can offer some questions which will help me track it down:
multithreading=False, procs=procs
(where procs
is the number of processors you have).procs
you set? It could be a data race.multithreading=False, procs=0
)?julia_kwargs={"optimize": 0}
to PySRRegressor?The more information the better – these questions will help me figure out where the problem could be lurking. Thanks! Miles
julia_kwargs={"optimize": 0}
did not fix it. As far as I can tell, it showed a similar error frequency as before. multithreading=False
After running a bunch of successful tests with multithreading disabled, I decided to turn it on. To my surprise, it worked perfectly. Then I noticed a new warning message:
C:\Users\Tadeu\anaconda3\envs\thesis2\lib\site-packages\pysr\julia_helpers.py:217: UserWarning: Julia has already started. The new Julia options {'threads': 12} will be ignored.
I restarted the Jupyter kernel and attempted to fit a model twice, with multithreading enabled both times, and it failed.
Thanks! Let me know if there are more tests I can run.
Awesome, thanks for answering those. So indeed using multiprocessing instead of multithreading seems like a good workaround for now (via multithreading=False
). It’s actually just as fast, if not faster, than multi-threading; it just takes a bit longer to start each search.
This is also very helpful for finding the bug, thanks. Because it only occurs for multithreading, but not multiprocessing, I think it is a data race issue. (Multiprocessing copies between processes, whereas threads can access the same resources). It’s interesting that it only seems to occur in PyJulia context though…
Should be fixed on most recent version. Ping me if not!
PySR throws an "OSError: exception: access violation reading" error. It seems to occur often when fitting a model many times (tried with the exact same settings and data). Occurs both in Jupyter and when running from a Python file.
Visual Studio Code outputs the following:
Traceback (most recent call last): File "c:\Users\Tadeu\Desktop\pysr-access-violation.py", line 44, in
model.fit(X_train , dx , variable_names=["x", "y", "z"])
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\pysr\sr.py", line 1792, in fit
self._run(X, y, mutated_params, weights=weights, seed=seed)
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\pysr\sr.py", line 1493, in _run
Main = init_julia(self.julia_project, julia_kwargs=julia_kwargs)
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\pysr\julia_helpers.py", line 180, in init_julia
Julia(**julia_kwargs)
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\julia\core.py", line 519, in init
self._call("const PyCall = Base.require({0})".format(PYCALL_PKGID))
File "C:\Users\Tadeu\anaconda3\envs\thesis\lib\site-packages\julia\core.py", line 554, in _call
ans = self.api.jl_eval_string(src.encode('utf-8'))
OSError: exception: access violation reading 0x0000025A5A9D1000 Exception ignored in atexit callback: <_FuncPtr object at 0x0000025A5A83AF60> OSError: exception: access violation reading 0x0000025A5A9D1000
PySR settings and a minimal example:
As far as I can tell, this is enough to reproduce the error. It occurs often enough that I usually have to restart Jupyter after fitting twice (this was not the case a month ago, for some reason).
Note: I thought this could be caused by the different datasets that were being fed to the model, but locking it after the first run still leads to the bug. I also removed the division and multiplication operator and somehow it managed to fit ~7 times before crashing, way more than the maximum of 2 that I was seeing when using a larger pool of binary_operators.
Let me know if there is something else I need to provide. I will also try to run this same example on a different computer to see if I get similar behaviour.