Closed GDYendell closed 9 months ago
I think the following happens (@Araneidae correct me if I'm wrong):
_process
callback: https://github.com/dls-controls/pythonSoftIOC/blob/503453184184de701af101ab4271e15e3557258e/softioc/device_core.py#L117-L124VAL
and calls _process
(on what thread?)record.set()
which calls db_put_field
: https://github.com/dls-controls/pythonSoftIOC/blob/503453184184de701af101ab4271e15e3557258e/softioc/device.py#L214-L215@mdavidsaver this used to work in 3.14.12.7, but now fails in 7.0.7.0. Can you think what might be going wrong here?
I've investigated this a bit, and had a few more observations to add:
caput
. .set()
on the record, anytime after iocInit()
. This effectively begins a recursive .set()
chain (as the on_update
calls set()
), and ends up locking up at the same place.I don't think I can follow all of the details, but I see a call to dbPutField()
. This function must not be called from a device support callback, or from any other context in which any record lock is held. For "virtual" access to fields within a device support callback, dbPut()
is the way to go.
Also, it looks like your db_put_field()
wrapper has the GIL locked when calling dbPutField()
. Unless you are zealous to avoid locking the GIL in a device support callback, this is constitutes a lock order violation.
The dbPutField()
calls are in fact correct, because the on_update
callback is called after record processing has completed.
However, thank you very much for the GIL spot! Yep, we need to drop the GIL around field access calls, and that makes a lot of sense. Am pretty sure that's fixed things now.
This will have been a long standing bug affecting pythonSoftIOC since the earliest days, and is probably capable of being triggered any time record processing is triggered at the precise instance that .set()
is being called on an Out record. This is a perfect example of a timing bug that can be next to impossible to reproduce.
I am very happy that this has been pinned down and fixed!
Thanks everyone! I will delete the example branch now that there are test cases for this.
We have had an issue where a python IOC would hang sometimes and have to be restarted. We narrowed it down to calling
my_record.set()
within theon_update
callback formy_record
and removing that has fixed the problem.I have added a branch with a minimal example to reproduce the problem. The instructions in the README should be sufficient to run it. The traceback provoked by faulthandler consistently shows the IOC stuck in
db_put_field
:This was tested with base 7.0.7.0 and it does not seem to happen with 3.14.12.7.