ISISComputingGroup / IBEX

Top level repository for IBEX stories
4 stars 2 forks source link

system tests: Fix python crash during tests #7643

Open FreddieAkeroyd opened 1 year ago

FreddieAkeroyd commented 1 year ago

As a developer i would like the system tests to not occasionally crash

This seems to be a python crash that takes a while to reporduce. After much testing first fixed a thread leak, but this was not source of problem. Real problem was calling an exception handler function that was a python object that had got garbage collected. We were using an earlier incantation of an exception handler replacement function that we had created, using the upstream one fixed the issue.

Tom-Willemsen commented 1 year ago

Will leave a longer-term test running under the debugger to see if we can catch something useful - but need FIT to apply a policy to the machine first.

Tom-Willemsen commented 1 year ago

FIT have adjusted some settings for us, now Dr. M seems to be able to launch python from a non-epics terminal but not from an epics term (??). I guess extra DLLs may be being loaded after being found on PATH after config_env has run and these extra DLLs somehow cause an issue?

Tom-Willemsen commented 1 year ago

Have put a log of errors that occur from non-epics terminal in \\isis\shares\ISIS_Experiment_Controls\data for tickets\Ticket7643\log_2023_04_11.txt. But this didn't actually get as far as running the tests, I think it is "crashing" out as soon as Dr. M detects first problem, I'm not sure whether this is actually the real issue or we need to enable some flags to get it to continue past this point anyway and actually run the tests. Error seems to be related to a desctuctor in openblas invoked on python interpreter shutdown, which I'm not sure should be directly related to problem we were seeing.

FreddieAkeroyd commented 1 year ago

Just parking this for the moment, we have the tests running though most of the time without a crash (need to check logs more) and will revisit when sorted out other transient test failures. Splitting the tests to run in batches seems to have resolved a lot of the issues, though it could be considered more a workaround than a fix.