DAP response queue filling up, starving interface

I am using DAPLink on ATSAM3U2C HIC configured with DAPv2 in a CI setup with PyOCD as the host interface and have observed sticky failures of the DAPLink firmware that manifest as USB timeouts, e.g.:

Exception in thread CMSIS-DAP receive (0000000050203220304d5157303031313031303397969903):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/cmsis_dap_probe.py", line 463, in read_ap_repeat_callback
    values = result()
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 1011, in reg_read_repeat_cb
    self.run()
  File "/opt/homebrew/Cellar/python@3.9/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/interface/pyusb_v2_backend.py", line 164, in rx_task
    read_data = self.ep_in.read(self.packet_size, 10 * 1000)
  File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/core.py", line 423, in read
    res = transfer.get_result()
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 154, in get_result
    self.daplink.flush()
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/utility/concurrency.py", line 29, in _locking
    return self.device.read(self, size_or_buffer, timeout)
  File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/core.py", line 1029, in read
    return func(self, *args, **kwargs)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 790, in flush
    self._read_packet()
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/utility/concurrency.py", line 29, in _locking
    return func(self, *args, **kwargs)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 1054, in _read_packet
    raw_data = self._interface.read()
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/pydapaccess/interface/pyusb_v2_backend.py", line 234, in read
    ret = fn(
  File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/backend/libusb1.py", line 846, in bulk_read
    raise DAPAccessIntf.DeviceError("Device %s read thread exited unexpectedly" % self.serial_number)
pyocd.probe.pydapaccess.dap_access_api.DAPAccessIntf.DeviceError: Device 0000000050203220304d5157303031313031303397969903 read thread exited unexpectedly

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/nlk/implant/icdi/pyocd.py", line 397, in memory_read
    return self.probe_session.target.read_memory_block8(addr, num_bytes)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/core/soc_target.py", line 212, in read_memory_block8
    return self.__read(self.lib.libusb_bulk_transfer,
  File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/backend/libusb1.py", line 954, in __read
    return self.selected_core_or_raise.read_memory_block8(addr, size)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/cortex_m.py", line 476, in read_memory_block8
    _check(retval)
  File "/Users/kbalke/Library/Python/3.9/lib/python/site-packages/usb/backend/libusb1.py", line 602, in _check
    raise USBTimeoutError(_strerror(ret), ret, _libusb_errno[ret])
usb.core.USBTimeoutError: [Errno 60] Operation timed out
    data = self.ap.read_memory_block8(addr, size)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/core/memory_interface.py", line 181, in read_memory_block8
    data32 = self.read_memory_block32(addr, size // 4)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/utility/concurrency.py", line 29, in _locking
    return func(self, *args, **kwargs)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/ap.py", line 1171, in _read_memory_block32
    resp += self._read_block32_page(addr, n//4)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/ap.py", line 1129, in _read_block32_page
    resp = self.dp.read_ap_multiple(self.address.address + self._reg_offset + MEM_AP_DRW, size)
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/dap.py", line 934, in read_ap_multiple
    return read_ap_multiple_cb()
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/coresight/dap.py", line 924, in read_ap_multiple_cb
    return result_cb()
  File "/opt/homebrew/lib/python3.9/site-packages/pyocd/probe/cmsis_dap_probe.py", line 470, in read_ap_repeat_callback
    raise self._convert_exception(exc) from exc
pyocd.core.exceptions.ProbeError: Device 0000000050203220304d5157303031313031303397969903 read thread exited unexpectedly

A failure with the above trace would be observed during a CI run when a probe is already open. Subsequent CI runs using the same DAP would observe a different trace, e.g.:

> pyocd rtt -t <target> -u <unique id>
0010339 I Target type is <target> [board]
0010339 C Error:  [__main__]
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/pyocd/__main__.py", line 161, in run
    status = cmd.invoke()
             ^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/pyocd/subcommands/rtt_cmd.py", line 116, in invoke
    with session:
  File "/opt/homebrew/lib/python3.11/site-packages/pyocd/core/session.py", line 391, in __enter__
    self.open()
  File "/opt/homebrew/lib/python3.11/site-packages/pyocd/core/session.py", line 525, in open
    self._probe.open()
  File "/opt/homebrew/lib/python3.11/site-packages/pyocd/probe/cmsis_dap_probe.py", line 260, in open
    self._link.open()
  File "/opt/homebrew/lib/python3.11/site-packages/pyocd/utility/concurrency.py", line 29, in _locking
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/pyocd/probe/pydapaccess/dap_access_cmsis_dap.py", line 746, in open
    self._interface.open()
  File "/opt/homebrew/lib/python3.11/site-packages/pyocd/probe/pydapaccess/interface/pyusb_v2_backend.py", line 89, in open
    assert self.closed is True
           ^^^^^^^^^^^^^^^^^^^
AssertionError

Doing some digging into the DAPLink firmware, it appears that this is happening because the IN endpoint is stuck, i.e., the firmware is not putting any data into the transmit FIFO and subsequent IN EP polls from the host time out. This is because the DAP_Cmd_queue has filled, and the current bulk implementation gates writing the IN EP data (DAP command response) on the queue being not full:

source/usb/bulk/usbd_bulk.c:

void USBD_BULK_EP_BULKOUT_Event(U32 event)   Brian Esquilona • Mon May  6 15:02:44 2019 • Bulk endpoints for CMSIS v2 support
  {
    U16 bytes_rece;
      uint8_t * rbuf;

    bytes_rece      = USBD_ReadEP(usbd_bulk_ep_bulkout, ptrDataIn, USBD_Bulk_BulkBufSize - DataInReceLen);
    ptrDataIn      += bytes_rece;
   DataInReceLen  += bytes_rece;

    if ((DataInReceLen >= USBD_Bulk_BulkBufSize) ||
            (bytes_rece    <  usbd_bulk_maxpacketsize[USBD_HighSpeed])) {
          if (DAP_queue_execute_buf(&DAP_Cmd_queue, USBD_Bulk_BulkOutBuf, DataInReceLen, &rbuf)) { // This is the offending branch; USBD_BULK_EP_BULKIN_Event(0) is only invoked if `DAP_queue_execute_buf` returns true...
              //Trigger the BULKIn for the reply
              if (USB_ResponseIdle) {
                  USBD_BULK_EP_BULKIN_Event(0);
                  USB_ResponseIdle = 0;
              }
          }
          //revert the input pointers
          DataInReceLen = 0;
          ptrDataIn     = USBD_Bulk_BulkOutBuf;
      }
  }

source/daplink/cmsis-dap/DAP_queue.c:

  BOOL DAP_queue_execute_buf(DAP_queue * queue, const uint8_t *reqbuf, int len, uint8_t ** retbuf)
{
    uint32_t rsize;
    if (queue->free_count > 0) { // <- if the queue is full, we always return `__FALSE`, and no progress is made.
        if (len > DAP_PACKET_SIZE) {
            len = DAP_PACKET_SIZE;
        }
        queue->free_count--;
        memcpy(queue->USB_Request[queue->recv_idx], reqbuf, len);
        rsize = DAP_ExecuteCommand(reqbuf, queue->USB_Request[queue->recv_idx]);
        queue->resp_size[queue->recv_idx] = rsize & 0xFFFF; //get the response size
        *retbuf = queue->USB_Request[queue->recv_idx];
        queue->recv_idx = (queue->recv_idx + 1) % DAP_PACKET_COUNT;
        queue->send_count++;
        return (__TRUE);
    }
    return (__FALSE);
}

The same gating is performed in the HID implementation, as well.

I'm still not sure what the reproduction criteria are for causing this queue to fill; if PyOCD is always processing transactions one at a time, I would expect the queue never to have more than 1 element in it. Perhaps this has something to do with USB_ResponseIdle--I'm not sure what the semantics of this flag are, but it seems to be managed incorrectly in the bulk EP implementation, since it only ever gets written to 1 by the USBD_BULK_EP_BULKIN_Event implementation and gets immediately cleared to 0 in the calling context from USBD_BULK_EP_BULKOUT_Event. It seems like USBD_BULK_EP_BULKIN_Event can also be invoked from the main event loop when an interrupt is observed on the corresponding endpoint, but the only configured interrupt source for that is TX complete. As such, I'm guessing that we're observing two EP OUT events before the EP IN TX complete, resulting in a missed call to USBD_BULK_EP_BULKIN_Event when USB_ResponseIdle is clear in the second pass through USBD_BULK_EP_BULKOUT_Event. This would result in an off-by-one, where a response is queued in DAP_Cmd_queue that doesn't get egressed until the next command is received. After 3 such bubbles, the queue fills completely and we get stuck forever due to the conditional logic I highlighted above.

To address this bug, I propose unconditionally invoking USBD_BULK_EP_BULKIN_Event from USBD_BULK_EP_BULKOUT_Event. We'll still get chaining of packet egress since USBD_BULK_EP_BULKIN_Event gets invoked in the case of IN EP TX complete, so when the last write finishes, we'll enqueue another write to the FIFO. In the case that the queue is empty, we do nothing in either case.

ARMmbed / DAPLink

DAP response queue filling up, starving interface #1026