bluesky / ophyd-async

Hardware abstraction for bluesky written using asyncio
https://blueskyproject.io/ophyd-async
BSD 3-Clause "New" or "Revised" License
7 stars 21 forks source link

Pyepics/Aioca Interplay on RHEL8 Broken #291

Closed callumforrester closed 2 weeks ago

callumforrester commented 1 month ago

Description

It seems that pyepics can break aioca when used with ophyd and opyd-async, respectively, under a very specific set of circumstances. If pyepics is installed in the Python environment, but not used, ophyd-async connections always time out. So far we have only observed this behaviour on Red Hat 8.

This is currently causing a problem in dodal beamline modules that have been fully ported to ophyd-async.

How to Reproduce

Below are a few scripts that reliably reproduce the problem (on RHEL8):

ophyd-async-pyepics.zip

Epics-base is needed on your system to run these.

Running should-work.sh should create an IOC and a device, and run a count plan. Running should-break.sh should create an IOC cause a timeout error when trying to connect to the PVs.

Workarounds

The following workarounds appear to work:

The second workaround is probably the reason the MX group hasn't noticed this bug yet.

callumforrester commented 1 month ago

From @dperl-dls:

did reproduce, should-work.sh successfully runs scan, should-break.sh raises NotConnected Red Hat Enterprise Linux release 8.9 (Ootpa) Currently Loaded Modulefiles: 1) vscode/latest 4) gdalogpanel/stable(default) 7) controls
2) global/directories 5) use.own 8) epics/3.14.12.7
3) dasctools/1(default) 6) controls-tools 9) controls_dev

Relm-Arrowny commented 1 month ago

It is not RHEL8 exclusive, I can recreate it with RHEL7 image

1) git/2 3) global/directories 5) gdalogpanel/stable(default) 7) controls-tools 9) epics/3.14.12.7 bash-4.2$ 2) kdiff3/0.9.98(default) 4) dasctools/1(default) 6) use.own 8) controls

coretl commented 1 month ago

Here's the magic: https://github.com/bluesky/ophyd-async/blob/09b48675619151c35654b0620a7d1e14590f6c13/src/ophyd_async/epics/_backend/_aioca.py#L157-L163

Before the first CA channel connect, we see if pyepics is imported. If it is then we use the CA context from pyepics instead of making one in aioca: https://github.com/dls-controls/aioca/blob/216bcc3e91d7215cd7ccbfa6bbe942abe196fd5b/aioca/_catools.py#L1041-L1043

This would be the code path taken if you use pyepics (ophyd) before aioca (ophyd-async)

If you don't have pyepics, then aioca will also work fine as that code will not trigger.

What I can't work out is why having pyepics in the environment but no ophyd causes a problem. What would be importing pyepics in the first place?

I could do with debugging this with someone, who is looking at this at the moment?

callumforrester commented 1 month ago

@coretl

What I can't work out is why having pyepics in the environment but no ophyd causes a problem. What would be importing pyepics in the first place?

I don't believe this permutation has been tried. We've had ophyd but no pyepics, but not pyepics but no ophyd

callumforrester commented 1 month ago

Also reproduced by @iain-hall and @stan-dot

coretl commented 1 month ago

This fixes it for me: https://github.com/bluesky/ophyd-async/issues/89#issuecomment-1841232321

joeshannon commented 1 month ago

This fixes it for me: #89 (comment)

Yes, unset PYEPICS_LIBCA in a python venv created from a DLS python module (based on conda) fixes this.

callumforrester commented 1 month ago

@coretl any action needed to fix this or do we just tell people to be careful with conda?

callumforrester commented 2 weeks ago

Closing as stale, can be reopened if it rears its head again

coretl commented 2 weeks ago

This can be closed as duplicate of #89