Open Nels52 opened 1 year ago
Based on the title, I think #472 could also be related. I haven't looked at your issue in detail, but I think it's not exactly the same issue, but they might be good to compare.
Also, I'd suggest you use ```
markers to mark your code blocks (see https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code)
Thanks @matthijskooijman for looking at this. However, the scenario I have identified by code inspection only occurs with an empty transmit Ringbuffer. It is a timing issue where interrupts are enabled and the samd Uart::IrqHandler is driven with a TXC interrupt from a previous write of a character to the UDR register AND this interrupts the Uart::write(data) method which is running in the normal (non IRQ) processing. I am assuming that any Uart IRQ interrupt will suspend non IRQ processing. I also assume there is only one processing thread on the MKRNB1500 Cortex processor. These seem like reasonable assumptions for a simple IOT device.
I am using the write(data) code from the \samd\1.8.13\cores\arduino\Uart.cpp:
size_t Uart::write(const uint8_t data)
{
if (sercom->isDataRegisterEmptyUART() && txBuffer.available() == 0) {
sercom->writeDataUART(data);
} else {
// spin lock until a spot opens up in the buffer
while(txBuffer.isFull()) {
uint8_t interruptsEnabled = ((__get_PRIMASK() & 0x1) == 0);
if (interruptsEnabled) {
uint32_t exceptionNumber = (SCB->ICSR & SCB_ICSR_VECTACTIVE_Msk);
if (exceptionNumber == 0 ||
NVIC_GetPriority((IRQn_Type)(exceptionNumber - 16)) > SERCOM_NVIC_PRIORITY) {
// no exception or called from an ISR with lower priority,
// wait for free buffer spot via IRQ
continue;
}
}
// interrupts are disabled or called from ISR with higher or equal priority than the SERCOM IRQ
// manually call the UART IRQ handler when the data register is empty
if (sercom->isDataRegisterEmptyUART()) {
IrqHandler();
}
}
txBuffer.store_char(data);
sercom->enableDataRegisterEmptyInterruptUART();
}
The window I am talking about is very small. It occurs when the Uart IrqHandler interrupts this code right after the else statement logic begins execution. The else logic is executed because the UDRE bit has not yet been set from the previous write on the Uart UDR register. If the UDRE bit is then set and TXC interrupt occurs right at the beginning of the else logic, the Uart IrqHandler would suspend this thread and begin execution. When the thread execution returns after the TXC interrupt is processed by the Uart IrqHandler, the while loop code is not executed because the transmit RingBuffer is empty. The data is then stored on the RingBuffer and UDRE interrupts are enabled. Unfortunately, the Uart IrqHandler will not get driven since the TXC interrupt for the previous write to the UDR has already completed processing.
I am assuming that any Uart IRQ interrupt will suspend non IRQ processing. I also assume there is only one processing thread on the MKRNB1500 Cortex processor. These seem like reasonable assumptions for a simple IOT device.
Yeah, I think those are indeed true.
From a quick reading, your analysis sounds good (and it does seem plausible that the code has race conditions), though I have not looked at the code in detail. I also won't find time for that, so I will leave this for others to comment on for now (I just wanted to point out the other issue, which is also related but, as you say, indeed a different issue).
To summarize I am running the modified Uart::flush() and Uart::write() methods documented at the top of this thread to make sure I don't hit the suspected issue I described above. I also believe these changes eliminate the need to define txBuffer as a SafeRingBuffer. There have been previous discussions about the lack of RingBuffer safety. See [https://github.com/arduino/ArduinoCore-API/issues/195] created by @KurtE.
I also made a change to protect against a double read of the UDR. It looks like that could happen if there is a frame error.
///RTN Modified Uart IrqHandler which guarantees the Handler never does a double Read of Uart UDR buffer.
void Uart::IrqHandler()
{
if (sercom->isFrameErrorUART()) {
// frame error, next byte is invalid so read and discard it
sercom->readDataUART();
sercom->clearFrameErrorUART();
} else if (sercom->availableDataUART()) {
rxBuffer.store_char(sercom->readDataUART());
if (uc_pinRTS != NO_RTS_PIN) {
// RX buffer space is below the threshold, de-assert RTS
if (rxBuffer.availableForStore() < RTS_RX_THRESHOLD) {
*pul_outsetRTS = ul_pinMaskRTS;
}
}
}
if (sercom->isDataRegisterEmptyUART()) {
if (txBuffer.available()) {
uint8_t data = txBuffer.read_char();
sercom->writeDataUART(data);
}
else {
sercom->disableDataRegisterEmptyInterruptUART();
}
}
if (sercom->isUARTError()) {
sercom->acknowledgeUARTError();
// TODO: if (sercom->isBufferOverflowErrorUART()) ....
// TODO: if (sercom->isParityErrorUART()) ....
sercom->clearStatusUART();
}
}
I believe these modified Uart::flush() and Uart::write() methods would allow txBuffer to be defined as a simple RingBuffer for the following reasons:
I have not looked at the changes needed to allow the rxBuffer to be defined as a simple RingBuffer, however, @KurtE has discussed the need to eliminate the _numElems variable and only use the _iHead and_iTail indices in the RingBuffer class. This would require changes in almost all of the RingBuffer methods. I have made no changes in the RingBuffer class methods for txBuffer safety.
I am working with a MKRNB1500 with MKRNB. Sometimes I see what appears to be an incomplete or no response from the SARA modem. The interface to the SARA modem is a UART serial interface.
I started looking at the UART interface code and I think I have identified a possible race condition between the Uart::write(data) method and the Uart::IrqHandler(). Here is the sequence:
I am not sure how the SARA modem would react to an incomplete AT command but it probably isn’t good.
In order to avoid a problem like this it seems like it would be better to first add the write data to the transmit RingBuffer and then use the Uart::flush() method to initiate the data transfer over the UART interface. The proposed code would require modifications to the Uart::write(data) and Uart::flush() methods:
Note that the Uart::flush() method must always be invoked to push the data out the UART interface. This doesn't seem to be much of a problem for the MKRNB1500 using the MKRNB library because the Modem.send() method in the Modem.cpp source file of the MKRNB library always does this.
I am now running a modified Uart.cpp with the above code changes to see if I still get my infrequent hangs. Note that the Uart::write(data) method above does not have the logic to address Issue #262. That issue only appears to happen when the Uart::write(data) is invoked with interrupts disabled or inside a higher priority ISR. Specifically, it appears that the Issue #262 reproduction test used println() commands inside an ISR to reproduce the problem and I am not doing anything like that. The Uart::write(data) issued on the SARA modem UART interface is only invoked in the background (non ISR) task priority.
I am not sure if I am missing something, and I would appreciate it if someone would comment on this hang scenario and possible solution.