Closed alexhroom closed 2 months ago
Thanks Alex, I can confirm that I can run the test ok on windows, but I get a seg fault on linux (specifically through using the Windows Subsystem for Linux).
some debugging: running in gdb python3
:
for custom XY:
Thread 80 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff4de00640 (LWP 11047)]
0x00007ffff38780d3 in RAT::SLDFunction (x=2.8571428571428572, SLD=..., sldVal=...) at cpp/RAT/coder_array.h:164
164 T& operator[](SZ _index) {
for standard layers:
Thread 80 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff4de00640 (LWP 11879)]
0x00007ffff387fc56 in RAT::allocateLayersForContrast (contrastLayers=..., outParameterisedLayers=..., useImaginary=useImaginary@entry=0 '\000', thisContrastLayers_data=thisContrastLayers_data@entry=0x2af80, thisContrastLayers_size=thisContrastLayers_size@entry=0x7fff4ddbe568) at cpp/RAT/allocateLayersForContrast.cpp:56
56 thisContrastLayers_data[i1 + n * i] = 0.0;
it seems that there are various issues around the GIL when it comes to calling Python from inside C threads? https://github.com/python/cpython/issues/111034 https://docs.python.org/3/c-api/init.html#non-python-created-threads unsure if the problem may be that
Irrespective of this issue, I will recommend using a Process instead of a python thread as the latter will affect the responsiveness of the GUI. The main problem with the python process is that it uses pickling to copy inputs i.e. Project and Controls but because of some issue the Project class cannot be pickled (Apologies, for some reason we did not open an issue on GitHub).
We could either try to fix the pickling issue so we can use the run
function
RAT.run(project, control)
or we could just pickle the inputs to RATMain
RATapi.rat_core.RATMain(problem_definition, cells, limits, cpp_controls, priors )
@StephenNneji in the GUI code i was using a QtCore.QThread
, which seems to be the recommended way to keep long-running things 'out of the way' without affecting the GUI. the issue above uses a regular Python thread in order to be a smaller reproduceable example (and to confirm that the segfault is on our end and not Qt's!)
As far as I know, QThread are just a wrapper on the python thread https://stackoverflow.com/questions/1595649/threading-in-a-pyqt-application-use-qt-threads-or-python-threads except if something has been changed in PyQt6
yeah, with the main benefits of the wrapper being that it handles signals/slots and so on to avoid the thread choking up the GUI! as in the post:
The main difference is that QThreads are better integrated with Qt (asynchrnous signals/slots, event loop, etc.).
will try it with a ProcessPoolExecutor when we can get the thing pickled correctly anyway and see if it's more performant
Memory checking shows an Invalid write of size 8
in the following code from allocateLayersForContrast.cpp
:
n = coder::internal::intlength(contrastLayers.size(0), contrastLayers.size
(1));
thisContrastLayers_size[0] = n;
thisContrastLayers_size[1] = 5;
for (i = 0; i < 5; i++) {
for (i1 = 0; i1 < n; i1++) {
thisContrastLayers_data[i1 + n * i] = 0.0;
}
}
Specifically, the line thisContrastLayers_data[i1 + n * i] = 0.0;
. This line is merely the initialisation of the array, so does not read in any RAT data. As far as I can tell, the array indices are correct, and the array is of type double, so I'm not sure what the cause of the invalid memory access is.
We also find a memory issue in SLDFunction.cpp
. In this code, we construct arrays for all values less than and greater than a particular function. It seems to me that there is a possibility that in the way this is done in the C++ code there is the possibility of an empty array, but this doesn't occur in the matlab code.
fascinatingly, it works completely fine if RAT is imported inside the thread:
from concurrent.futures import ThreadPoolExecutor
def thread_func():
import RATapi as RAT
project, results = RAT.examples.DSPC_standard_layers()
return project, results
executor = ThreadPoolExecutor()
future = executor.submit(thread_func)
project, results = future.result()
doesn't segfault. even stranger is that if you import it outside the thread and then again inside it does segfault, but i may just be missing some detail of python import implementation here:
from concurrent.futures import ThreadPoolExecutor
import RATapi as RAT
def thread_func():
import RATapi as THREADRAT
project, results = THREADRAT.examples.DSPC_standard_layers()
return project, results
executor = ThreadPoolExecutor()
future = executor.submit(thread_func)
project, results = future.result()
I've had a fresh look at allocateLayersForContrast.m
in the MATLAB source:
@arwelHughes Can you please confirm whether the code on line 35 should instead read: thisContrastLayers(i, :) = [];
instead of thisContrastLayers(1, :) = [];
. I can't guarantee this bug is causing the memory issues, but we should look to fix it urgently regardless.
Hi Paul, Well, firstly I have a different version to you – did you put the ‘disp’ in for debugging?
@.***
It probably should be ‘thisContrastLayers(I,:)…. as you suggest, but in practice it makes no difference. That line is never actually reached in normal usage (because there will never be an empty layer), but I think it was necessary because otherwise Coder throws a ‘thisContrastLayers is not fully defined on all execution paths’ error (I think!). Cheers, Arwel
From: Paul Sharp @.> Date: Monday, 2 September 2024 at 17:31 To: RascalSoftware/python-RAT @.> Cc: Hughes, Arwel (STFC,RAL,ISIS) @.>, Mention @.> Subject: Re: [RascalSoftware/python-RAT] run crashes on Linux if run outside of main thread (Issue #70)
I've had a fresh look at allocateLayersForContrast.m in the MATLAB source:
image.png (view on web)https://github.com/user-attachments/assets/31d9b618-85a5-4d22-8caf-c6688f8b3dee
@arwelHugheshttps://github.com/arwelHughes Can you please confirm whether the code on line 35 should instead read: thisContrastLayers(i, :) = [];. I can't guarantee this bug is causing the memory issues, but we should look to fix it urgently regardless.
— Reply to this email directly, view it on GitHubhttps://github.com/RascalSoftware/python-RAT/issues/70#issuecomment-2325074999, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMNVXUMIHUG5SAJXSNJDM3ZUSHEZAVCNFSM6AAAAABM4FBUK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRVGA3TIOJZHE. You are receiving this because you were mentioned.Message ID: @.***>
Thanks Arwel. Yes, the disp lines are for debugging.
since we've merged #74 i've found that RATMain does not produce a segfault in a separate process (only a separate thread). so this issue is a lot less urgent since we can use processes for multiprocessing in the GUI!
closed by #78
On Linux (including IDAaaS), attempting to run the following script will produce a segfault:
This is an issue because RasCAL-2 needs long-running code (e.g.
RAT.run()
) to run outside the main loop, as the main loop is used for handling and updating the GUI itself. It seems to run successfully on Windows (@DrPaulSharp managed to run it okay). If anyone else would like to run the script on whatever devices they have to hand that'd be useful!Various observations:
standard_layers
orcustom_XY
examples but NOT thecustom_layers
examples which run successfully.DSPC_standard_layers
it happens no matter what procedure is chosen.calculate
procedure, theRunning [procedure]
message is produced before it segfaults.