Data race with `rxbufstat` in several nic drivers

I was testing my application with GCC ThreadSanizier on linux and I got several warnings from the SOEM code (see below). Looking at the code there are several places in the nic drivers, where rxbufstat is accessed, but no real locking seems to be included for this variable. In the linux nicdrv.c I've identified the following places where this variable is accessed (ignoring initialization routines):

ecx_getindex: Access is protected by getindex_mutex
ecx_setbufstat: No locking
ecx_outframe: No locking
ecx_outframe_red: Access is protected by tx_mutex
ecx_inframe: Partially protected by rx_mutex

As you can see, there is either is no locking at all, or if mutexes are used they are not the same across these functions. As I understand it should be possible to use the SOEM library to simultaneous exchange process data and access SDOs. In this scenario these functions may be called in parallel, so they should be thread safe. To make this code thread safe a lot of lock/unlock blocks around the rxbufstat variable are needed. So, I’m not sure if this is the best way to fix this. Possibly an easier way would be to use atomics for this state.

Example ThreadSanitizer output:


  Write of size 4 at 0x7bd000005fe8 by thread T15:
    #0 ecx_inframe /workdir/SOEM/oshw/linux/nicdrv.c:400 (libsoem.so+0x2902c)
    #1 ecx_waitinframe_red /workdir/SOEM/oshw/linux/nicdrv.c:481 (libsoem.so+0x294a0)
    #2 ecx_waitinframe /workdir/SOEM/oshw/linux/nicdrv.c:556 (libsoem.so+0x298d4)
    #3 ecx_receive_processdata_group /workdir/SOEM/soem/ethercatmain.c:1935 (libsoem.so+0x2436f)

  Previous write of size 4 at 0x7bd000005fe8 by thread T14 (mutexes: write M0, write M1, write M2):
    #0 ecx_inframe /workdir/SOEM/oshw/linux/nicdrv.c:437 (libsoem.so+0x29388)
    #1 ecx_waitinframe_red /workdir/SOEM/oshw/linux/nicdrv.c:481 (libsoem.so+0x294a0)
    #2 ecx_srconfirm /workdir/SOEM/oshw/linux/nicdrv.c:593 (libsoem.so+0x29998)
    #3 ecx_FPRD /workdir/SOEM/soem/ethercatbase.c:325 (libsoem.so+0x7123)
    #4 ecx_mbxreceive /workdir/SOEM/soem/ethercatmain.c:1029 (libsoem.so+0x2110d)
    #5 ecx_SDOwrite /workdir/SOEM/soem/ethercatcoe.c:371 (libsoem.so+0x955f)

Hmm, thanks for your post. Data races are always a serious bug. But as far as I can see, and supported by years of use by many applications, there is no race condition here.

I think the tool is not taking into account the fact that rxbufstat is actually an array of int. And a lot of care is taken to either, have mutexes to protect it, or otherwise there is a guarantee of single ownership and only the owner modifies the variable. Each item in the array is of the native cpu word size and aligned to the word boundary. So there is no risk of partial writes or reads. You can call this an implicit atomic.

We do expect that the socket read function recv() is thread safe and does not deliver the same packet twice.

Do not trust outputs from ThreadSanitizer or other tools without checking the results.

Thanks for your quick reply. I agree that one should not blindly trust the results from code analysis tools. Looking at the code I still think there are some actual data races here. Not all interactions between the mentioned functions are necessary a data race, because as you mentioned rxbufstat is an array. Usually, a thread acquires an index via ecx_getindex and then only works with its own index, so no race possible. However, there are some situations where an index "owned" by a different thread is accessed, most notably in the ecx_inframe function. See the following simplified version from the linux driver:

int ecx_inframe(ecx_portt *port, uint8 idx, int stacknumber)
{
   /* check if requested index is already in buffer ? */
   if ((idx < EC_MAXBUF) && ((*stack->rxbufstat)[idx] == EC_BUF_RCVD))
   {
        // ...
   }
   else
   {
      pthread_mutex_lock(&(port->rx_mutex));
      /* non blocking call to retrieve frame from socket */
      if (ecx_recvpkt(port, stacknumber))
      {
        // ...
            /* found index equals requested index ? */
            if (idxf == idx)
            {
               /* mark as completed */
               (*stack->rxbufstat)[idx] = EC_BUF_COMPLETE;
            }
            else
            {
               /* check if index exist and someone is waiting for it */
               if (idxf < EC_MAXBUF && (*stack->rxbufstat)[idxf] == EC_BUF_TX)
               {
                  /* mark as received */
                  (*stack->rxbufstat)[idxf] = EC_BUF_RCVD;
               }
            }
      }
      pthread_mutex_unlock( &(port->rx_mutex) );
   }
}

The interesting part is when we receive a packet with an index which we did not request.

Here the first data race is the check for (*stack->rxbufstat)[idx] == EC_BUF_RCVD, which happens outside the lock. The value EC_BUF_RCVD is set while holding the rx_mutex, but that's not relevant. This is actually the race reported by ThreadSanitizer above. To me this is a clear case of a data race. This is also not some random write which is not relevant for the read side. The write side tries to communicate, that the EC_BUF_RCVD state has been reached, to the read side.

Another race is the check for is the check for (*stack->rxbufstat)[idxf] == EC_BUF_TX. The EC_BUF_TX state is set in the ecx_outframe/ecx_outframe_red without holding the rx_mutex. This one might not be relevant, because there is "enough" time between sending and receiving, but still a theoretical race.

Now, both races might not actually cause problems. For the first race the read side might miss the EC_BUF_RCVD write on the first iteration, but then fetch it on the second try. However, this is dependent on the CPU architecture. On x86 this might be safe, though I'm not entirely sure on that. Since threading and data race problems are notoriously hard, at least I wanted to bring this to your attention.

Now, both races might not actually cause problems. For the first race the read side might miss the EC_BUF_RCVD write on the first iteration, but then fetch it on the second try. However, this is dependent on the CPU architecture.

Indeed, the scenarios you described are intended this way.

The write ((*stack->rxbufstat)[idxf] = EC_BUF_RCVD;) in the mutex protected side indicates the transfer of ownership. After that write nothing is modified anymore in the buffer. So if the read is executed it either sees the old value or the new. When it sees the old value it has to loop again. That is why it is important to have the rxbufstat be a word size variable that is also word aligned in memory. In needs to behave atomically. The reason we did not use proper atomics is because SOEM also need to be compiled on 8 and 16 bit micro-controller targets. And they do not have this special support.

The second case is indeed separated by the transmission itself. On purpose the variable is set before the actual transmit. So it is guaranteed that even when this code is stuck for execution there can not be a receiving part that does not also see the change in the variable.

   (*stack->rxbufstat)[idx] = EC_BUF_TX;
   rval = send(*stack->sock, (*stack->txbuf)[idx], lp, 0);

I agree with you that all of this is only true if there is no code reordering by the compiler, and the CPU itself also does not do any funny tricks. To prevent these side effect we should introduce memory barriers. On single CPU systems these barriers would be zero-op. Also the barriers only need to be per CPU and not system wide. Then again, I checked the code output of GCC with various optimization settings and the code reordering does not happen on the critical points. Multiple people have spend a lot of time analyzing precisely the code you are referring to for race conditions.

The advantage of all this careful variable manipulation with minimal locks, is very efficient code execution.

And I want to thank you for diving this deep in the code and trying to understand / improve things. That is why I think you deserve an elaborate answer.

Thanks for the elaborate answer. After looking at the memory order guarantees given by the x86 architecture, I think you are correct, that the code should work as expected. At least if the compiler does not perform any reordering of the critical code sections. While I’d like to see some actual atomic operations, I get why you chose to omit them. With atomics the compiler won’t perform any breaking reordering, but the generated instructions should be the same on x86. However, on other architectures with less strict ordering (e.g. ARM) the atomic operations are actually needed. So, if Linux on ARM is a supported platform and you can live with requiring a C11 compiler, you should consider using atomics.

So, if Linux on ARM is a supported platform and you can live with requiring a C11 compiler, you should consider using atomics.

I have to agree there. To solve the compiler diversity issue we could try to encapsulate the atomics in the abstraction layer. For compilers that do no support atomics it ill just be a regular read/write operation.

The disadvantage of more and more abstractions is that users have no feeling anymore about the inner workings and things could subtlety break if assumptions turn out to be wrong.

OpenEtherCATsociety / SOEM

Data race with `rxbufstat` in several nic drivers #852