epics-extensions / ca-gateway

Channel Access PV Gateway
http://www.aps.anl.gov/epics/extensions/gateway/
Other
19 stars 18 forks source link

Gateway crashes when accessing PCAS(Py) array PV where the actual count is over the configured count #49

Open ericonr opened 1 year ago

ericonr commented 1 year ago

The backtrace for this crash is included below. It happens when we try to access a char array PV that's served by a PCASPy IOC, where 'count' was configured to 5000, but the actual count was 6270. It prints the following error message:

*** Error in '/opt/epics-7.0.4/modules/ca-gateway-2.1.2/bin/linux-x86_64/gateway': free(): invalid next size (normal): 0x00007fffc8001d00 ***

Interestingly, caget only requests the first 5000 bytes, and doesn't crash, and pyepics requests all bytes and stores them properly, so doesn't corrupt memory either. This makes it seem that each client has to handle this their own way, meaning ca-gateway doesn't.

The crash location and error message indicates to me that there was memory corruption which libc detected and, for security reasons, abort()ed on.

(gdb) bt
#0  0x00007ffff68d0387 in raise () from /lib64/libc.so.6
#1  0x00007ffff68d1a78 in abort () from /lib64/libc.so.6
#2  0x00007ffff6912ed7 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff691b299 in _int_free () from /lib64/libc.so.6
#4  0x00007ffff798e011 in gddDestructor::destroy(void*) ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libgdd.so.4.13.0
#5  0x00007ffff797c1f9 in gdd::~gdd() () from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libgdd.so.4.13.0
#6  0x00007ffff7bc1823 in casAsyncReadIOI::~casAsyncReadIOI() ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#7  0x00007ffff7bc1869 in casAsyncReadIOI::~casAsyncReadIOI() ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#8  0x00007ffff7bc1089 in casAsyncIOI::cbFunc(casCoreClient&, epicsGuard<casClientMutex>&, epicsGuard<evSysMutex>&) ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#9  0x00007ffff7bc28d5 in casEventSys::process(epicsGuard<casClientMutex>&) ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#10 0x00007ffff7bc7a5e in casStreamEvWakeup::expire(epicsTime const&) ()
   from /opt/epics-7.0.4/modules/pcas-4.13.2/lib/linux-x86_64/libcas.so.4.13.0
#11 0x00007ffff74d386e in timerQueue::process(epicsTime const&) ()
   from /opt/epics-7.0.4/base/lib/linux-x86_64/libCom.so.3.18.0
#12 0x00007ffff74b527f in fdManager::process(double) () from /opt/epics-7.0.4/base/lib/linux-x86_64/libCom.so.3.18.0
#13 0x000000000041a9ee in gateServer::mainLoop() ()
#14 0x000000000040d7f1 in main ()
ralphlange commented 1 year ago

Could be an issue in PCAS, the C++ CA server that the Gateway and PCASPy use. On the other hand, if the PCASPy IOC serves the array out (and doesn't encounter issues), it might rather be related to the Gateway's use of PCAS.

It's memory corruption of the underlying gdd data container. Probably a container is created with the reported size (5000), then filled with the data of the real size array.

PCASPy should also get a ticket for allowing this and sending illegal Channel Access traffic.

ericonr commented 1 year ago

if the PCASPy IOC serves the array out (and doesn't encounter issues)

Afaik it doesn't, its been running for a few years now.

Probably a container is created with the reported size (5000), then filled with the data of the real size array.

That's what I imagined!

PCASPy should also get a ticket for allowing this and sending illegal Channel Access traffic.

Noted, will send this report their way.