Closed brian-stravaro closed 2 years ago
could you check the epics7 branch for same issue?
It appears that the code is commented out in the epics7 branch. When you pull the merged changes from develop into epics7, it should update the comment.
Today I investigated the issue. I ran several debug sessions and found out that the "dispose" is only called on unregistered VIs. The crash is from disposing of invalid memory (LabVIEW memory manager vs callab lib) and not from disposing of wrong string elements. When I comment out lines 1424 to 1437 (as in epics7 branch), crashes no longer occur. @brian-stravaro could you verify my results?
I guess I'm a little skeptical that commenting out the code is the right thing to do. If the memory we are looking at is completely invalid (meaning (itEventResultCluster)->stringValueArray), then we have bigger problems, since we just passed a lengthy validation of itEventResultCluster. I also think it's problematic if (stringValueArray) is invalid. The code is there to prevent a memory leak, and I worry that commenting out the code is just leaking memory. I'm doing some more testing to better characterize when the crash happens.
In my test (which is in some complex code I inherited from someone else), I am crashing in postEvent() after getting a message that the connection is down. An event was registered (the eventResultCluster was a vector of length 1), and we crash trying to resize data in that eventResultCluster). It smells a little bit like a race condition to me, but I'm not sure why that would be.
A slightly different crash, and more evidence of a race condition...
I crash on this line in postEvent():
memcpy((*(*(*itEventResultCluster)->StringValueArray)->elt[j])->str, (*(*stringValueArray)->elt[j])->str, (*(*stringValueArray)->elt[j])->cnt);
stringValueArray looks okay, but (*itEventResultCluster)->StringValueArray is null. Here's the value of **itEventResultCluster:
StringValueArray = 0x0, ValueNumberArray = 0x0,
StatusString = 0x0, StatusNumber = 0,
SeverityString = 0x0, SeverityNumber = 0,
TimeStampString = 0x0, TimeStampNumber = 1022869498,
FieldNameArray = 0x0, FieldValueArray = 0x0,
ErrorIO = {status = 0 '\000', code = 0, source = 0x0}}
There is code just before this that would allocate (itEventResultCluster)->StringValueArray if it was null or the wrong size. There is also code just before this that would have already crashed if it was still null. I also saved a copy of (itEventResultCluster)->StringValueArray earlier in the function, and it was non-null. (With the caveat that the compiler might not have preserved the value by the time I'm looking at in the debugger.)
I use the "SoftIoc Demo.vi" to test the postEvent() issue. I use an external tool to start and stop the VI. After 2.000 starts and stops of the VI there was no crash or other problem. I used environment variable CALAB_NODBG=D:\calLab.log to redirect debugging messages to a file. Here is the modified source file I used: calab.cpp .
I believe the race condition crash is associated with this issue: https://github.com/epics-extensions/CALab/issues/12
I'd still like to keep this issue open until we confirm that we have fixed or removed the stringValueArray problem of disposing the wrong data structure.
In postEvent(), if the stringValueArray changes sizes from our internal array, we dispose of the wrong string elements when trying to reallocate for the new data. This leads to a crash.