Dispose wrong array in postEvent()

brian-stravaro commented 2 years ago

In postEvent(), if the stringValueArray changes sizes from our internal array, we dispose of the wrong string elements when trying to reallocate for the new data. This leads to a crash.

nugatritter commented 2 years ago

could you check the epics7 branch for same issue?

brian-stravaro commented 2 years ago

It appears that the code is commented out in the epics7 branch. When you pull the merged changes from develop into epics7, it should update the comment.

nugatritter commented 2 years ago

Today I investigated the issue. I ran several debug sessions and found out that the "dispose" is only called on unregistered VIs. The crash is from disposing of invalid memory (LabVIEW memory manager vs callab lib) and not from disposing of wrong string elements. When I comment out lines 1424 to 1437 (as in epics7 branch), crashes no longer occur. @brian-stravaro could you verify my results?

brian-stravaro commented 2 years ago

I guess I'm a little skeptical that commenting out the code is the right thing to do. If the memory we are looking at is completely invalid (meaning (itEventResultCluster)->stringValueArray), then we have bigger problems, since we just passed a lengthy validation of itEventResultCluster. I also think it's problematic if (stringValueArray) is invalid. The code is there to prevent a memory leak, and I worry that commenting out the code is just leaking memory. I'm doing some more testing to better characterize when the crash happens.

brian-stravaro commented 2 years ago

In my test (which is in some complex code I inherited from someone else), I am crashing in postEvent() after getting a message that the connection is down. An event was registered (the eventResultCluster was a vector of length 1), and we crash trying to resize data in that eventResultCluster). It smells a little bit like a race condition to me, but I'm not sure why that would be.

brian-stravaro commented 2 years ago

A slightly different crash, and more evidence of a race condition...

I crash on this line in postEvent(): memcpy((*(*(*itEventResultCluster)->StringValueArray)->elt[j])->str, (*(*stringValueArray)->elt[j])->str, (*(*stringValueArray)->elt[j])->cnt);

stringValueArray looks okay, but (*itEventResultCluster)->StringValueArray is null. Here's the value of **itEventResultCluster:

StringValueArray = 0x0, ValueNumberArray = 0x0,
StatusString = 0x0, StatusNumber = 0, 
SeverityString = 0x0, SeverityNumber = 0,
TimeStampString = 0x0, TimeStampNumber = 1022869498,
FieldNameArray = 0x0, FieldValueArray = 0x0, 
ErrorIO = {status = 0 '\000', code = 0, source = 0x0}}

There is code just before this that would allocate (itEventResultCluster)->StringValueArray if it was null or the wrong size. There is also code just before this that would have already crashed if it was still null. I also saved a copy of (itEventResultCluster)->StringValueArray earlier in the function, and it was non-null. (With the caveat that the compiler might not have preserved the value by the time I'm looking at in the debugger.)

nugatritter commented 2 years ago

I use the "SoftIoc Demo.vi" to test the postEvent() issue. I use an external tool to start and stop the VI. After 2.000 starts and stops of the VI there was no crash or other problem. I used environment variable CALAB_NODBG=D:\calLab.log to redirect debugging messages to a file. Here is the modified source file I used: calab.cpp .

brian-stravaro commented 2 years ago

I believe the race condition crash is associated with this issue: https://github.com/epics-extensions/CALab/issues/12

I'd still like to keep this issue open until we confirm that we have fixed or removed the stringValueArray problem of disposing the wrong data structure.

epics-extensions / CALab

Dispose wrong array in postEvent() #4