epics-modules / motor

APS BCDA synApps module: motor
https://epics-modules.github.io/motor/
20 stars 47 forks source link

Usage message_mutex in motorOms (in drvMAXv.cc observed) and then in motordrvCom.cc #172

Closed hjunkes closed 3 years ago

hjunkes commented 3 years ago

Besides small problems with VME access and devLib2 with RTEMS5, I came across the following problem: A mutex is created in drvMAXv.cc: in motor_init a mutex is created:

    pvtdata = (struct MAXvController *) malloc(sizeof(struct MAXvController));
    pvtdata->message_mutex = epicsMutexMustCreate();
    pmotorState->DevicePrivate = pvtdata;

This mutex is used during configuration of the card in the IOC-Init-Thread/Task and therefore the mutex "belongs to" (is owned by) the init task. But after the initialization of the board the "motor_task" is generated:

epicsThreadCreate((char *) "MAXv_motor", epicsThreadPriorityMedium,
                  epicsThreadGetStackSize(epicsThreadStackMedium),
                  (EPICSTHREADFUNC) motor_task, (void *) &targs);

Subsequent commands to the MAXv card will be triggered by this epicsThread, but the mutex is not owned by this task and this leads to error messages (not owner with RECURSIVE pthread_mutex'es) or to supending the task with the "old" phtread mutex'es. as a small workaround I have changed the following:

@@ -1391,6 +1395,8 @@ static int motor_init()
                 /* Is this needed??? */
                 send_recv_mess(card_index, DONE_QUERY, MAXv_axis[motor_index], axis_pos, 1);
             }
+            Debug(1, "motor_init: destroy mutex\n");
+            epicsMutexDestroy(pvtdata->message_mutex);

             Debug(2, "motor_init: Init Address=%p\n", localaddr);
             Debug(3, "motor_init: Total encoders = %d\n", total_encoders);

and consequently also in motordrvCom.cc

@@ -188,6 +200,7 @@ epicsShareFunc int motor_task(struct thread_args *args)
             for (itera = 0; itera < *tabptr->cardcnt_ptr; itera++)
             {
                 struct controller *brdptr = (*tabptr->card_array)[itera];
+                ((struct MAXvController*)(brdptr->DevicePrivate))->message_mutex = epicsMutexMustCreate();
                 if (brdptr != NULL && brdptr->motor_in_motion)
                     stale_data_delay = query_axis(itera, tabptr, previous_time, stale_data_max_delay);
             }

Before I start a pull request I wanted to discuss this first. Maybe I have missed something completely.

tboegi commented 3 years ago

@hjunkes : Could that be elaborated a little bit? "Usage message_..": What should this mean ? It is used, but shouldn't ?

grep -R message_mutex modules/motorOms/omsApp/src/drvMAXv.cc:#define MUTEX(card) ((struct MAXvController )(motor_state[card]->DevicePrivate))->message_mutex modules/motorOms/omsApp/src/drvMAXv.cc: pvtdata->message_mutex = epicsMutexMustCreate(); modules/motorOms/omsApp/src/drvMAXv.h: epicsMutexId message_mutex; tb@susi:/nfs/home/tb/NoBackup/projects/epics/motor$ grep -R MUTEX modules/motorOms/omsApp/src/drvMAXv.cc:#define MUTEX(card) ((struct MAXvController )(motor_state[card]->DevicePrivate))->message_mutex modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) { modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexLock(MUTEX(card)); modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card)); modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) { modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexLock(MUTEX(card)); modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card)); modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) { modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexLock(MUTEX(card)); modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card)); modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card)); modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card));

hjunkes commented 3 years ago

Hallo Torsten, Du warst viel schneller als ich alles eingeben konnte ;-)

MarkRivers commented 3 years ago

I guess I don't understand the issue. Nearly all of my drivers create an epicsMutex in a constructor that is called by whatever thread is running iocsh, and then they are used by either asyn port threads or threads created with epicsThreadCreate in the constructor. I have never heard of an issue relating to what task "owns" a mutex. The whole point is that mutexes must be shared among multiple tasks (threads).

I use the drvMAXv.cc driver a lot on vxWorks and have never seen the errors you describe.

Is this on RTEMS?

hjunkes commented 3 years ago

Yes, this is in RTEMS5 and we now use the posix-mutex'es. And in modules/libcom/src/osi/os/posix/osdMutex.c one find

void epicsMutexOsdUnlock(struct epicsMutexOSD * pmutex) { int status;

status = mutexLock(&pmutex->lock);
checkStatus(status, "pthread_mutex_lock epicsMutexOsdUnlock");
if(status)
    return;

if ((pmutex->count <= 0) || (pmutex->ownerTid != pthread_self())) {
    pthread_mutex_unlock(&pmutex->lock);
    checkStatus(status, "pthread_mutex_unlock epicsMutexOsdUnlock");
    errlogPrintf("epicsMutexOsdUnlock but caller is not owner\n");
    cantProceed("epicsMutexOsdUnlock but caller is not owner");
    return;
}

and (https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/apis/users_65.htm)

The pthread_mutex_unlock() function unlocks the mutex specified. If the calling thread does not currently hold the mutex (via a previous call to pthread_mutex_lock(), pthread_mutex_trylock(), or pthread_mutex_timedlock_np()) the unlock request fails with the EPERM error.

And I have the feeling that on creation of the mutex the tid is crucial. That should be noticeable with Linux systems?

MarkRivers commented 3 years ago

Is the problem that unlock is being called but lock has not been called by that thread?

hjunkes commented 3 years ago

Yes, that's what I suspected at first. But I could not verify it for sure. A problem could be that I have no motors connected to the MAXv at the moment. Possibly it is an unfavorable timing. Maybe it's enough to make sure that no more messages are pending before generating the motor-task? I don't know if it is generally a good idea to create a lock in a task that can be used for other tasks after the init task is finished?

hjunkes commented 3 years ago

I have now looked at the whole thing again.

In drvMAXv.cc (and only there, with other oms boards mutexLock/Unlock is not used) the mutex is created in motor_init, which is then used in recv/send_mess().

motor_init is called by MAXvConfig() in st.cmd (i.e. in the init - task).

At the end of motor_init the epicsThread "MAXv_motor" is created and in it the mutex previously created by init-task (with a different task-ID) is further used.

That's why one get these messages: epicsMutex pthread_mutex_unlock epicsMutexOsdUnlock failed: error Not owner

anjohnson commented 3 years ago

I now understand the problem @hjunkes is bringing up.

The drvMAXv.cc code uses this construct to take its mutexes:

    if (!epicsMutexTryLock(MUTEX(card))) {
        Debug(1, "Waiting for mutex\n");
        epicsMutexLock(MUTEX(card));
    }

That looks like it should be reasonable, but unfortunately it's actually very badly broken, because the return status value from epicsMutexTryLock() is epicsMutexLockOK=0 (false) when it has successfully taken the lock.

The first time the above code gets called there is no contention for the lock so epicsMutexTryLock() succeeds and returns 0. The if (!0) block thus gets run, calling epicsMutexLock() which takes the mutex a second time. Since there is only one call to epicsMutexUnlock() afterwards the mutex is still owned by the thread that first took it. Now another thread wants to access the device and calls the above code again; this time the epicsMutexTryLock() fails and returns epicsMutexLockTimeout=1 but the if (!1) block gets skipped and the thread carries on its merry way and does its I/O anyway. Finally it releases the mutex by calling epicsMutexUnlock(), but since the mutex ownership is wrong that generates the error that Heinz was seeing.

I can understand how this coding error might have happened: The C++ class epicsMutex wrapper has a tryLock() method which returns true if it succeeds, whereas the epicsMutexTryLock() routine returns 0 (false) if it succeeds. That's a bad API design from EPICS Core, for which I apologize but probably can't do much to change.

I recommend replacing all instances of if (!epicsMutexTryLock(...)) with if (epicsMutexTryLock(...) != epicsMutexLockOK). I would also check the other modules in synApps for this usage in case the same construct is used elsewhere.

MarkRivers commented 3 years ago

I just searched for epicsMutexTryLock in all .c* files in my synApps tree, including areaDetector. This is what I find:

corvette:~/devel>find . -type f -name '*.c*' -exec grep -U3 -H epicsMutexTryLock {} \;
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-static int send_recv_mess(int card, const char * command, const char *axis, char *buf, int nMessages) {
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    int retval;
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc:    if (!epicsMutexTryLock(MUTEX(card))) {
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        Debug(1, "send_recv_mess: waiting for mutex\n");
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        epicsMutexLock(MUTEX(card));
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    }
--
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    pmotor = (struct MAXv_motor *) motor_state[card]->localaddr;
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    Debug(9, "send_mess: pmotor = %p\n", pmotor);
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc:    if (!epicsMutexTryLock(MUTEX(card))) {
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        Debug(1, "send_mess: waiting for mutex\n");
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        epicsMutexLock(MUTEX(card));
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    }
--
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    pmotor = (struct MAXv_motor *) motor_state[card]->localaddr;
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc:    if (!epicsMutexTryLock(MUTEX(card))) {
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        Debug(1, "recv_mess: waiting for mutex\n");
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        epicsMutexLock(MUTEX(card));
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    }
./asyn-4-41/asyn/asynDriver/asynManager.c-    if(details>=1) {
./asyn-4-41/asyn/asynDriver/asynManager.c-        epicsMutexLockStatus mgrStatus;
./asyn-4-41/asyn/asynDriver/asynManager.c-        epicsMutexLockStatus syncStatus;
./asyn-4-41/asyn/asynDriver/asynManager.c:        syncStatus = epicsMutexTryLock(pport->synchronousLock);
./asyn-4-41/asyn/asynDriver/asynManager.c-        if(syncStatus==epicsMutexLockOK)
./asyn-4-41/asyn/asynDriver/asynManager.c-             epicsMutexUnlock(pport->synchronousLock);
./asyn-4-41/asyn/asynDriver/asynManager.c:        mgrStatus = epicsMutexTryLock(pport->asynManagerLock);
./asyn-4-41/asyn/asynDriver/asynManager.c-        if(mgrStatus==epicsMutexLockOK)
./asyn-4-41/asyn/asynDriver/asynManager.c-             epicsMutexUnlock(pport->asynManagerLock);
./asyn-4-41/asyn/asynDriver/asynManager.c-        fprintf(fp,"    enabled:%s connected:%s numberConnects %lu\n",
corvette:~/devel>

The only place I see a problem is in drvMAXv.cc.

Note that I tried to use "git submodules" with "git grep" in motor, but that returns an error:

corvette:~/devel/motor>git submodule foreach git grep epicsMutexTryLock
Entering 'modules/motorAMCI'
Stopping at 'modules/motorAMCI'; script returned non-zero status.

Why doesn't that work?

tboegi commented 3 years ago

On 11 jan. 2021, at 19:41, Mark Rivers notifications@github.com wrote:

I just searched for epicsMutexTryLock in all .c* files in my synApps tree, including areaDetector. This is what I find:

corvette:~/devel>find . -type f -name '.c' -exec grep -U3 -H epicsMutexTryLock {} \; ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-static int send_recv_mess(int card, const char command, const char axis, char *buf, int nMessages) { ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- int retval; ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) { ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- Debug(1, "send_recv_mess: waiting for mutex\n"); ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- epicsMutexLock(MUTEX(card)); ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- }

./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- pmotor = (struct MAXv_motor *) motor_state[card]->localaddr; ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- Debug(9, "send_mess: pmotor = %p\n", pmotor); ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) { ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- Debug(1, "send_mess: waiting for mutex\n"); ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- epicsMutexLock(MUTEX(card)); ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- }

./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- pmotor = (struct MAXv_motor *) motor_state[card]->localaddr; ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) { ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- Debug(1, "recv_mess: waiting for mutex\n"); ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- epicsMutexLock(MUTEX(card)); ./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc- } ./asyn-4-41/asyn/asynDriver/asynManager.c- if(details>=1) { ./asyn-4-41/asyn/asynDriver/asynManager.c- epicsMutexLockStatus mgrStatus; ./asyn-4-41/asyn/asynDriver/asynManager.c- epicsMutexLockStatus syncStatus; ./asyn-4-41/asyn/asynDriver/asynManager.c: syncStatus = epicsMutexTryLock(pport->synchronousLock); ./asyn-4-41/asyn/asynDriver/asynManager.c- if(syncStatus==epicsMutexLockOK) ./asyn-4-41/asyn/asynDriver/asynManager.c- epicsMutexUnlock(pport->synchronousLock); ./asyn-4-41/asyn/asynDriver/asynManager.c: mgrStatus = epicsMutexTryLock(pport->asynManagerLock); ./asyn-4-41/asyn/asynDriver/asynManager.c- if(mgrStatus==epicsMutexLockOK) ./asyn-4-41/asyn/asynDriver/asynManager.c- epicsMutexUnlock(pport->asynManagerLock); ./asyn-4-41/asyn/asynDriver/asynManager.c- fprintf(fp," enabled:%s connected:%s numberConnects %lu\n", corvette:~/devel>

The only place I see a problem is in drvMAXv.cc.

Note that I tried to use "git submodules" with "git grep" in motor, but that returns an error:

corvette:~/devel/motor>git submodule foreach git grep epicsMutexTryLock Entering 'modules/motorAMCI' Stopping at 'modules/motorAMCI'; script returned non-zero status.

Why doesn't that work?

Good question. The rumor-channel suggests this one:

git grep --recurse-submodules epicsMutexTryLock

modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) { modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) { modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) {

anjohnson commented 3 years ago

The only place I see a problem is in drvMAXv.cc.

Good, thanks for checking that.

I tried to use "git submodules" with "git grep" in motor, but that returns an error: Why doesn't that work?

git grep has a --recurse-submodules option which is probably faster. Your first inner 'git grep' command returned a non-zero status because it didn't get any matches from the motorAMCI module, which stopped the git submodule foreach from looping (it thought it got an error). You could have used this instead which ignores errors:

$ git submodule foreach 'git grep epicsMutexTryLock || true'
hjunkes commented 3 years ago

will create a pull request for this patch:

diff --git a/omsApp/src/devOms.dbd b/omsApp/src/devOms.dbd
index 569232b..9c0401e 100644
--- a/omsApp/src/devOms.dbd
+++ b/omsApp/src/devOms.dbd
@@ -11,6 +11,7 @@ registrar(oms58Registrar)
 device(motor,VME_IO,devMAXv,"OMS MAXv")
 driver(drvMAXv)
 registrar(OmsMAXvRegister)
+variable(drvMAXvdebug)

 # Oregon Micro Systems PC68/78 RS-232 serial driver support.
 device(motor,VME_IO,devOmsPC68,"OMS PC68/78")
diff --git a/omsApp/src/drvMAXv.cc b/omsApp/src/drvMAXv.cc
index 0c4a3e8..8eff6b9 100644
--- a/omsApp/src/drvMAXv.cc
+++ b/omsApp/src/drvMAXv.cc
@@ -609,7 +609,7 @@ errorexit:      errMessage(-1, "Invalid device directive");
 static int send_recv_mess(int card, const char * command, const char *axis, char *buf, int nMessages) {
     int retval;

-    if (!epicsMutexTryLock(MUTEX(card))) {
+    if (epicsMutexTryLock(MUTEX(card)) != epicsMutexLockOK) {
         Debug(1, "send_recv_mess: waiting for mutex\n");
         epicsMutexLock(MUTEX(card));
     }
@@ -647,7 +647,7 @@ static RTN_STATUS send_mess(int card, const char *com, const char *name)
     pmotor = (struct MAXv_motor *) motor_state[card]->localaddr;
     Debug(9, "send_mess: pmotor = %p\n", pmotor);

-    if (!epicsMutexTryLock(MUTEX(card))) {
+    if (epicsMutexTryLock(MUTEX(card)) != epicsMutexLockOK) {
         Debug(1, "send_mess: waiting for mutex\n");
         epicsMutexLock(MUTEX(card));
     }
@@ -759,7 +759,7 @@ static int recv_mess(int card, char *com, int amount)

     pmotor = (struct MAXv_motor *) motor_state[card]->localaddr;

-    if (!epicsMutexTryLock(MUTEX(card))) {
+    if (epicsMutexTryLock(MUTEX(card)) != epicsMutexLockOK) {
         Debug(1, "recv_mess: waiting for mutex\n");
         epicsMutexLock(MUTEX(card));
     }