BCDA-APS / apstools

various tools for use with Bluesky at the APS
https://bcda-aps.github.io/apstools/latest/
Other
16 stars 9 forks source link

ophyd v2 and PVAccess #836

Open prjemian opened 1 year ago

prjemian commented 1 year ago

Current plans in ophyd v2 call for initial integration with p4p:

class EpicsTransport(Enum):
    """The sorts of transport EPICS support"""

    #: Use Channel Access (using aioca library)
    ca = ChannelCa
    #: Use PVAccess (using p4p library)
    pva = ChannelCa  # TODO change to ChannelPva when Alan's written it

@coretl: This would be the point at which to choose pvapy instead.

Update: follow this PR

prjemian commented 1 year ago

Until ophyd advances, we have no official ophyd PVA interface.

prjemian commented 1 year ago

FYI @sveseli, the work would be done in the ophyd repo but once that settles down, there will be some work needed in apstools. This issue documents the need.

coretl commented 1 year ago

I've done some testing about whether different epics clients can coexist in the same interpreter in this repo: https://github.com/coretl/can_epics_clients_coexist

The epicscorelibs module provides a consistent set of libCom, libCa, etc, that is pip installable. aioca and p4p use this, and pyepics can be made to use it via an environment variable.

If we don't use a consistent set then we might get seg faults, but I was unclear on when this is a problem, so I made some tests. Changing the import order gives unexpected results:

I can't make any sense of this, and I'm not sure how to proceed.

@mdavidsaver, @sveseli, @anjohnson do you have any ideas?

mdavidsaver commented 1 year ago

... I was unclear on when this is a problem ...

I am also not so clear. The circumstances are likely platform specific. eg. linux/ELF vs. windows/PE handle library loading and symbol lookup differently. (I assume you are testing on linux?)

  • pyepics (epicscorelibs), aioca, p4p surprisingly seg faults in epicsMutexLock

You might have a look the process memory map to make sure only one copy of each library is in fact being loaded. On Linux this (very verbose) information is found in /proc/<pid>/maps. Or through the GDB shell with info proc mappings.

mdavidsaver commented 1 year ago
  • pyepics (bundled libs), pvapy, aioca, p4p (in that order), works. Don't know why as they link to different libs

I suspect this would not be true on Windows, where there is no analog for the RTLD_GLOBAL vs. RTLD_LOCAL distinction of dlopen().

coretl commented 1 year ago

I am also not so clear. The circumstances are likely platform specific. eg. linux/ELF vs. windows/PE handle library loading and symbol lookup differently. (I assume you are testing on linux?)

Yes, I am testing on Linux, although I have since added CI to test on Windows too. I couldn't get gdb working on Mac in CI after an hour of trying so I gave up on that.

  • pyepics (epicscorelibs), aioca, p4p surprisingly seg faults in epicsMutexLock

You might have a look the process memory map to make sure only one copy of each library is in fact being loaded. On Linux this (very verbose) information is found in /proc/<pid>/maps. Or through the GDB shell with info proc mappings.

The Linux seg faults:

The last 2 seg faults are understandable, there are 2 versions of libCom loaded, but I don't know what causes the first. Any other ideas?

  • pyepics (bundled libs), pvapy, aioca, p4p (in that order), works. Don't know why as they link to different libs

I suspect this would not be true on Windows, where there is no analog for the RTLD_GLOBAL vs. RTLD_LOCAL distinction of dlopen().

The windows results are more like what I expected to see:

Looks like loading 2 python modules with conflicting DLLs causes the non-epicscorelibs one to fail to load the DLL. It doesn't seg fault though

mdavidsaver commented 1 year ago

... I couldn't get gdb working on Mac in CI ...

Neither have I. However, OSX (at least in the github actions images) does some of this automatically as ~/Library/Logs/DiagnosticReports/*.crash or ~/Library/Logs/CrashReporter/*.crash. I've automated this as https://github.com/mdavidsaver/ci-core-dumper

mdavidsaver commented 1 year ago

This is a crash on exit via Py_FinalizeEx() and atexit_callfuncs() ending in an libffi (ctypes) call to ca_clear_channel(). So either pyepics or aioca. Unfortunately, I can't say which one.

coretl commented 1 year ago

Neither have I. However, OSX (at least in the github actions images) does some of this automatically as ~/Library/Logs/DiagnosticReports/*.crash or ~/Library/Logs/CrashReporter/*.crash. I've automated this as https://github.com/mdavidsaver/ci-core-dumper

I added it, and it works well for Linux, but doesn't appear to do anything for Windows or Macos, am I doing something wrong? https://github.com/coretl/can_epics_clients_coexist/actions/runs/4619741504

coretl commented 1 year ago

This is a crash on exit via Py_FinalizeEx() and atexit_callfuncs() ending in an libffi (ctypes) call to ca_clear_channel(). So either pyepics or aioca. Unfortunately, I can't say which one.

I'll disable the atexit hook on both in turn and see which one fails...

coretl commented 1 year ago

Ok, so either works on its own, so it looks like both are trying to destroy the same ca_context. In my actual application they both run in different threads, but the atexit function runs in the main thread so calls ca_context_destroy on the same context. Do you know how to destroy a different thread's ca_context?

mdavidsaver commented 1 year ago

... Do you know how to destroy a different thread's ca_context?

This can't be done as such. Instead, imo. a library/module using libca contexts should take steps to create its own context, and prevent other code from observing it. Also to prevent accidental usage of a context being managed by other code. The use of thread locals makes this awkward. Here is one example where I use a c++ class to manage a private libca context, and Attach it temporarily when calling ca_* functions.

https://github.com/mdavidsaver/bsas/blob/9d823160c24131221ac26e4462c23647ef2b9591/bsasApp/src/collect_ca.h#L56-L72

https://github.com/mdavidsaver/bsas/blob/9d823160c24131221ac26e4462c23647ef2b9591/bsasApp/src/collect_ca.cpp#L70-L143

coretl commented 1 year ago

Ok, as my usecase is getting aioca and pyepics to coexist, and I can make aioca lazily create the context, where pyepics eagerly makes a context, and they both want a pre-emptive context, is there any reason not to make aioca use pyepics context if it already exists? As long as the lifetime of the context is the lifetime of the program (it is destroyed with an atexit hook) are there any downsides to sharing the context?

coretl commented 1 year ago

Latest test results where aioca and pyepics don't destroy each other's contexts: https://github.com/coretl/can_epics_clients_coexist/actions/runs/4630318385

@mdavidsaver I implemented the context sharing / destroying on the context we create in https://github.com/dls-controls/aioca/pull/36. I'm aware that switching in and out the contexts around each ca call would be more robust, but it's a lot less invasive to share the context between aioca and pyepics. I'd be interested to hear your thoughts on this...

@prjemian How urgently do you need pvapy support? Will you be switching entirely to PVA as part of ophyd.v2 or adding it gradually? As the tests show, without a common base libCom the results are unpredictable. The Linux test loading pyepics (epicscorelibs) aioca pvapy p4p seems to work, but it fails on Windows and Mac. Changing the import order makes it break, and the seg faults worry me.

@sveseli I think this has convinced me we need https://github.com/epics-base/pvaPy/issues/80 in order to support pvapy in ophyd.v2 in a robust way

mdavidsaver commented 1 year ago

I've automated this as https://github.com/mdavidsaver/ci-core-dumper

I added [ci-core-dumper], and it works well for Linux, but doesn't appear to do anything for Windows or Macos, am I doing something wrong?

I see that you have ulimit -c unlimited. There is another conceptually similar inheritable setting on windows SetErrorMode() which needs to be set (and sometimes isn't). The wrapper python -m ci_core_dumper exec ... sets both. wrt. OSX, I have no control over the crash reporter, and don't know what would be preventing it from running (in time).