Closed RobMeades closed 6 years ago
OK, so my logic above is flawed: the interrupt handler only clears the UART_FLAG_RXNE
flag after it has called UARTSerial::rx_irq()
, so the flag will be set when serial_readable()
is called. But the effect remains. I'll poke at it more tomorrow and report back on whether I'm seeing phantoms.
@RobMeades Hi Rob - would be great if you can share a a basic sample code, using PC serial for instance and the same APIs you're using in your program
Understood, though that might take a while as I would have to understand how UARTSerial
worked first. I am actively investigating the original data-loss-at-high-data-rates issue (which may be related) with @kjbracey-arm at the moment, let me continue with that and I will return to this either (a) when we've sorted it or (b) when we think that resolving this might resolve that, IYSWIM.
Hi - I had a look at the driver, and it occurred to me that the manual clearing of RXNE looked dubious. Rob confirms that removing the manual clearing from uart_irq
solves the problem.
The act of reading from the data register clears RXNE automatically. If the IRQ handler clears it manually, then it can drop a byte of data that arrived during execution of the handler.
It's also possible that the corresponding manual clearing of TC for Tx could be an issue - losing an interrupt if somehow the a transmit triggered by the handler completes.
On another note, Rob's traces show that output is slow - it doesn't maintain full line rate.
I believe that's because TxIrq is attached to TC "transmit complete" rather than TXE "transmit register empty". So we see a funny pattern where the interrupt handler writes 2 bytes at a time, then a delay before the next 2. Don't know if it's worth adjusting that.
FYI, ref. Kevin's point about the manual clearing of TC for Tx, after commenting out the manual clearing of RXNE in uart_irq()
I was able to successfully receive data at the serial port without any loss at 460800.
However my setup is end to end to a remote server (a modem is attached to the serial port) and I found that, when I switched to 460800, the server received no data whatsoever, yet it was fine when I ran the modem interface at 230400. My logic analyzer showed that there was no Tx data on the serial lines at all. Speculatively, I commented out the manual clearing of TC for Tx, in uart_irq()
and my transmit sprang to life. So I think Kevin is probably right on that count also.
That may in turn be the answer for the odd use of TC rather than TXE. If it was TXE, and the clear was in the same place, it would have jammed at any baud rate.
FYI, speculatively (not claiming any science here) I tried attaching the Tx interrupt to TXE rather than TC. This worked fine but my end-to-end uplink throughput (my data is 100% uplink UDP packets) dropped by around 20% (180 kbits/s versus 220 kbits/s).
I assume that would be the effect of doing 1 byte per interrupt rather than 2, so having increased CPU load. The bursty thing worked out in your favour. That was why I wasn't sure about adjusting it.
Devices with FIFOs usually have water-mark settings to counteract that, eg generating an interrupt with 4 bytes or space available in the buffer, else after a timeout.
This lacks FIFOs, so nothing you can do on RX, but using TC effectively lets you do 2 at a time - once TC triggers you know the next write will vacate the TX register immediately, allowing the interrupt handler to write 2.
A data rate that high at one-byte-per-interrupt is really pushing it. The original point of the FIQ interrupt and dedicated registers on the ARM was to achieve 250 kbit/s 1-byte-at-a-time to the floppy disk drive, meaning the ARM could do the job itself without needing a DMA controller.
Managing the same with normal IRQs is a stretch, even with the faster CPU core speeds we have 30 years later (memory and peripherals haven't got proportionately faster).
So what I'm saying is we would need to start worrying about the DMA/"multibuffer" mode here to get near line rate. Whereas a UART with FIFOs and water marks would work fine and transparently.
Not sure if its possible to hide that DMA/multibuffer inside the HAL's normal serial_getc/putc API.
UARTSerial doesn't support the separate asynch API at the moment, but even if it did the STM driver doesn't seem to DMA for asynch anyway.
@LMESTM I'd be happy to support such work as we have modems connected to this serial port that support far higher data rates than 460800. How can I help?
@RobMeades I'll have a look asap, hopefully next Monday - what would help is a simple sample code with Serial to PC to reproduce the issue
Thought you might say that, have been working on it, will post something here when I have it.
@LMESTM : I've created a repo containing a Greentea test of serial port performance. Its checks for zero character loss and an expected throughput, repeating the test at increasing baud rates up to a configurable maximum (I've proposed a throughput of no more than 20% below the line rate at up to 460800 bits/s). It requires an mbed board with a spare serial port on which the Tx
line can be looped back to the Rx
line and the RTS
line to the CTS
line. It is fully configurable and can be run under a debugger if required.
You can find the test here: https://github.com/u-blox/baud-rate-test. I hope this helps you make us a nice fast DMA'ed serial implementation for the STM32F4 :-).
@RobMeades - thanks I'll have a look. Short term objective is to make the setup stable and fix the interrupts management.
@kjbracey-arm said:
UARTSerial doesn't support the separate asynch API at the moment, but even if it did the STM driver doesn't seem to DMA for asynch anyway
so is there a point in supporting DMA as of now ? This is not in our plans for now.
From my testing with a single looped-back serial port the maximum achievable communication rates (calculated using 10 bits per byte to allow for stop/start bits) were as below. For such an enormously capable chip, the lack of buffering in the UART HW is quite odd, and it has quite an impact on the achievable throughput; Kevin's/my assumption was that the DMA mode was how ST intended this UART to be used in order to compensate for the lack of a HW buffer.
+--------+---------------------+
| Baud | Throughput (bits/s) |
+--------+---------------------+
| 9600 | 8984 |
| 57600 | 52636 |
| 115200 | 96417 |
| 230400 | 177780 |
| 460800 | 298963 |
| 921600 | 446156 |
+--------+---------------------+
@RobMeades - would be mildly interested to see the same table using the TXE interrupt. I know you found TXE was slower with real lwIP at 460800, but this synthetic test would be interesting.
@LMESTM - UARTSerial supporting the async API isn't in our current plans, so there's no need to start work on the DMA now.
It appears it would be necessary to get maximum performance out of this chip, but as it would largely be a chip-specific performance optimisation, I don't see it as an immediate priority for the core code. (Also partly due to the maintenance effort of supporting both modes - not all platforms will support the asynch API).
But we do need the non-asynch use to work without loss, so please do concentrate on that.
@RobMeades trying out your test code. I'm hitting a comilation issue (other than the _rxbuf one ).
Build failures:
* NUCLEO_F446RE::GCC_ARM::TESTS-UNIT_TESTS-DEFAULT
Building project default (NUCLEO_F446RE, GCC_ARM)
Scan: GCC_ARM
Scan: default
Compile [100.0%]: main.cpp
[Error] main.cpp@218,26: 'class mbed::UARTSerial' has no member named 'set_baud'; did you mean '_baud'?
[Warning] main.cpp@275,0: comparison between signed and unsigned integer expressions [-Wsign-compare]
Are there missing commits in the branch ?
Sorry, that's a PR that hasn't got through yet, see here:
https://github.com/ARMmbed/mbed-os/pull/4615
Without it you can only set the baud rate when UARTSerial()
is instantiated. Can you hack it or shall I bring the two threads onto one branch somewhere?
@RobMeades that's ok. It compiles fine now.
@kjbracey-arm: shifting from the Tx Complete interrupt to the Tx Empty interrupt the numbers do improve (see below). Why would that be, when they degraded for my 'real' case?
+--------+---------------------+
| Baud | Throughput (bits/s) |
+--------+---------------------+
| 9600 | 8986 |
| 57600 | 52637 |
| 115200 | 105159 |
| 230400 | 207300 |
| 460800 | 375936 |
| 921600 | 511058 |
+--------+---------------------+
With TXE you're getting your transmit data in earlier, reducing dead line time, but you're doing 1 TX byte per interrupt rather than 2, increasing total CPU time spent on data pumping.
That doesn't have an adverse impact on your simple test, which has nothing much else to do, but when using lwIP, you ended up starving the network stack. Presumably not enough time left for all the packet copying and checksumming and whatever.
Would need to do detailed profiling to figure it out for certain, but that's my hand-waving explanation.
Complicated. On our C030 board we have an HSPA modem connected to this serial port that can run at several megabits per second so, for us, throughput is everything. Maybe we need to look at writing our own DMA serial driver?
@RobMeades ok so the test is running now, but I can't see any failure due to missed characters. Have you been able to reproduce on your side ? Extra note: I agree that there is no good reason for clearing the TC and RXNE FLAGs in uart_irq() so I plan to fix that anyway
I did when I was testing on Saturday, let me try again. It is probably quite timing sensitive, since a character has to arrive in the right window.
I think you have to persist, it took me two tries. Maybe hack the test to run only at 460800, with a 100% threshold (so the throughput doesn't matter) and for longer than 10 seconds? Here was my error case:
[1499681134.23][CONN][RXD] >>> Running case #1: 'Serial speed test'...
[1499681134.23][CONN][INF] found KV pair in stream: {{__testcase_name;Serial speed test}}, queued...
[1499681134.23][CONN][INF] found KV pair in stream: {{__testcase_start;Serial speed test}}, queued...
[1499681145.24][CONN][RXD]
[1499681145.25][CONN][RXD] === Test run 1, at 9600 bits/s completed after 10.001 seconds, sent 8986 byte(s), received 8986 byte(s) (throughput 8986 bits/s with a threshold of 0 bits/s) ===
[1499681145.25][CONN][RXD]
[1499681156.26][CONN][RXD]
[1499681156.27][CONN][RXD] === Test run 2, at 57600 bits/s completed after 10.001 seconds, sent 52634 byte(s), received 52634 byte(s) (throughput 52634 bits/s with a threshold of 0 bits/s) ===
[1499681156.27][CONN][RXD]
[1499681167.27][CONN][RXD]
[1499681167.28][CONN][RXD] === Test run 3, at 115200 bits/s completed after 10.001 seconds, sent 105065 byte(s), received 105065 byte(s) (throughput 105065 bits/s with a threshold of 0 bits/s) ===
[1499681167.28][CONN][RXD]
[1499681178.29][CONN][RXD]
[1499681178.30][CONN][RXD] === Test run 4, at 230400 bits/s completed after 10.001 seconds, sent 191927 byte(s), received 191927 byte(s) (throughput 191927 bits/s with a threshold of 0 bits/s) ===
[1499681178.30][CONN][RXD]
[1499681189.31][CONN][RXD]
[1499681189.32][CONN][RXD] === Test run 5, at 460800 bits/s completed after 10.004 seconds, sent 280610 byte(s), received 280610 byte(s) (throughput 280610 bits/s with a threshold of 0 bits/s) ===
[1499681189.32][CONN][RXD]
[1499681189.34][CONN][RXD]
[1499681189.35][CONN][RXD] !!! Received 273 character(s) (transmitted 753 character(s)), received '3', expected '2', last 17 character(s) received were !!!
[1499681189.35][CONN][RXD] 67890123453789013
@RobMeades - extending UARTSerial to support async transfers so megabit rates can work shouldn't be too hard, but I don't believe we're particularly focused on high throughput, so it's not in our current plans to do it. Maybe something to discuss later.
@RobMeades ok, got it now, by increasing the max speed to 921600 ... [1499685936.35][CONN][RXD] !!! Received 337 character(s) (transmitted 771 character(s)), received '7', expected '6', last 17 character(s) received were !!! [1499685936.35][CONN][RXD] 01234567890127457
One generic question: I haven't the background of why you need UARTSerial vs RawSerial, can you explain ?
UARTSerial does background transfers via buffers - it's a "full" driver providing a useable FileHandle so applications can do "write 200" and continue, with the transfer continuing in the background. Or if they do do a really big write, their thread blocks properly without consuming CPU time while the background transfer proceeds.
RawSerial's write(200) would actually block and use 100% CPU while the whole transfer was happening. Would probably achieve full line rate as a result, but not a lot of use if we want to let the network stack do anything while we're pumping - it's being used for PPP here.
The problem being seen isn't specific to UARTSerial - anyone using RawSerial to pump data in the background using interrupts would see the same symptoms.
(@RobMeades - it occurs to me if you're messing with really high data rates, you may need to increase the UARTSerial buffer size to help performance.)
UARTSerial()
is the class that underpins serial comms in the new Cellular API, which was introduced in mbed 5.5. It provides an interrupt-driven, buffered, serial entity that can be passed between and AT parser and PPP as required. I'm sure @kjbracey-arm can explain it better though.
FYI, with a UARTSerial()
buffer size of 512 bytes the peak throughput rolls off at around 290 kbits/s:
@RobMeades @kjbracey-arm Thanks for your explanations. I was mentioning RawSerial because the asynch support is there and that avoids as well blocking the calling thread.
@RobMeades Would you mind sharing the changes you've done to test using TXE instead of TC ?
Edit : @RobMeades what is toolchain and profile used for the figures you shared ?
I just looked inside uart_irq()
and serial_irq_set()
for UART_IT_TC
and replaced it with UART_IT_TXE
, see below (this diff also has the commenting out of the lines that reset the flags):
diff --git a/targets/TARGET_STM/TARGET_STM32F4/serial_device.c b/targets/TARGET_STM/TARGET_STM32F4/serial_device.c
index 18bc953..2530441 100644
--- a/targets/TARGET_STM/TARGET_STM32F4/serial_device.c
+++ b/targets/TARGET_STM/TARGET_STM32F4/serial_device.c
@@ -273,16 +273,16 @@ static void uart_irq(int id)
UART_HandleTypeDef * huart = &uart_handlers[id];
if (serial_irq_ids[id] != 0) {
- if (__HAL_UART_GET_FLAG(huart, UART_FLAG_TC) != RESET) {
- if (__HAL_UART_GET_IT_SOURCE(huart, UART_IT_TC) != RESET) {
+ if (__HAL_UART_GET_FLAG(huart, UART_FLAG_TXE) != RESET) {
+ if (__HAL_UART_GET_IT_SOURCE(huart, UART_IT_TXE) != RESET) {
irq_handler(serial_irq_ids[id], TxIrq);
- __HAL_UART_CLEAR_FLAG(huart, UART_FLAG_TXE);
+// __HAL_UART_CLEAR_FLAG(huart, UART_FLAG_TXE);
}
}
if (__HAL_UART_GET_FLAG(huart, UART_FLAG_RXNE) != RESET) {
if (__HAL_UART_GET_IT_SOURCE(huart, UART_IT_RXNE) != RESET) {
irq_handler(serial_irq_ids[id], RxIrq);
- __HAL_UART_CLEAR_FLAG(huart, UART_FLAG_RXNE);
+// __HAL_UART_CLEAR_FLAG(huart, UART_FLAG_RXNE);
}
}
if (__HAL_UART_GET_FLAG(huart, UART_FLAG_ORE) != RESET) {
@@ -438,7 +438,7 @@ void serial_irq_set(serial_t *obj, SerialIrq irq, uint32_t enable)
if (irq == RxIrq) {
__HAL_UART_ENABLE_IT(huart, UART_IT_RXNE);
} else { // TxIrq
- __HAL_UART_ENABLE_IT(huart, UART_IT_TC);
+ __HAL_UART_ENABLE_IT(huart, UART_IT_TXE);
}
NVIC_SetVector(irq_n, vector);
NVIC_EnableIRQ(irq_n);
@@ -452,7 +452,7 @@ void serial_irq_set(serial_t *obj, SerialIrq irq, uint32_t enable)
all_disabled = 1;
}
} else { // TxIrq
- __HAL_UART_DISABLE_IT(huart, UART_IT_TC);
+ __HAL_UART_DISABLE_IT(huart, UART_IT_TXE);
// Check if RxIrq is disabled too
if ((huart->Instance->CR1 & USART_CR1_RXNEIE) == 0) {
all_disabled = 1;
@LMESTM I believe the RawSerial asynch API would likely show a variant of the same loss issue, because it goes through the same uart_irq() to do the asynchronous transfer.
@kjbracey-arm sure I'm thinking about performances here. I will send a PR not clearing the flags for fixing the data loss .
I would also expect the current HAL asynch performance to be much the same as the UARTSerial's portable version - the data pump is basically the same.
(Although maybe UARTSerial does more hw reads? It does do some "readable" checking to try to get more than 1 byte per interrupt, which is futile on this hardware).
@RobMeades about peak throughput that you've observed - is it with the test setup or your real application with lwIP and modem ?
Just to share a few results I've observed here. I've made proposed modifications in a branch here: https://github.com/LMESTM/mbed/tree/test_uartserial_flow
It shows a higher limit than yours. I'm using release profile for this measurement. +--------+---------------------+ | Baud | Throughput (bits/s) | +--------+---------------------+ | 230400 | 210071 | | 460800 | 382946 | | 921600 | 659400 | |1843200 | 876112 | +--------+---------------------+
[1499693872.78][CONN][RXD] === Test run 1, at 230400 bits/s completed after 10.001 seconds, sent 210071 byte(s), received 210071 byte(s) (throughput 210071 bits/s with a threshold of 115200 bits/s) ===
[1499693872.78][CONN][RXD]
[1499693883.78][CONN][RXD]
[1499693883.80][CONN][RXD] === Test run 2, at 460800 bits/s completed after 10.001 seconds, sent 382946 byte(s), received 382946 byte(s) (throughput 382946 bits/s with a threshold of 230400 bits/s) ===
[1499693883.80][CONN][RXD]
[1499693894.80][CONN][RXD]
[1499693894.81][CONN][RXD] === Test run 3, at 921600 bits/s completed after 10.002 seconds, sent 659400 byte(s), received 659400 byte(s) (throughput 659400 bits/s with a threshold of 460800 bits/s) ===
[1499693894.81][CONN][RXD]
[1499693905.82][CONN][RXD]
[1499693905.83][CONN][RXD] === Test run 4, at 1843200 bits/s completed after 10.001 seconds, sent 876112 byte(s), received 876112 byte(s) (throughput 876112 bits/s with a threshold of 921600 bits/s) ===
My results were all with the test setup.
I've just grabbed your branch and run the same tests here [still in debug profile] and they do indeed show a good improvement, though not as large as yours. It seems to me that there is a startup condition (maybe timing, temperature, who knows?) somewhere that can give different results as I can no longer repeat the apparent improvement I got with switching to the Tx Empty interrupt above: it showed ~500 kbits/s on the attempt I posted but, since then, I've not managed to get as high as even 300 kbits/s. Anyway, I like your changes, I am up to 375 kbits/s with debug profile now (repeated across three test runs).
=== Test run 1, at 9600 bits/s completed after 10.001 seconds, sent 8986 byte(s), received 8986 byte(s) (throughput 8986 bits/s with a threshold of 0 bits/s) ===
=== Test run 2, at 57600 bits/s completed after 10.001 seconds, sent 52634 byte(s), received 52634 byte(s) (throughput 52634 bits/s with a threshold of 0 bits/s) ===
=== Test run 3, at 115200 bits/s completed after 10.001 seconds, sent 105141 byte(s), received 105141 byte(s) (throughput 105141 bits/s with a threshold of 0 bits/s) ===
=== Test run 4, at 230400 bits/s completed after 10.001 seconds, sent 192513 byte(s), received 192513 byte(s) (throughput 192513 bits/s with a threshold of 0 bits/s) ===
=== Test run 5, at 460800 bits/s completed after 10.001 seconds, sent 346709 byte(s), received 346709 byte(s) (throughput 346709 bits/s with a threshold of 0 bits/s) ===
=== Test run 6, at 921600 bits/s completed after 10.001 seconds, sent 376034 byte(s), received 376034 byte(s) (throughput 376034 bits/s with a threshold of 0 bits/s) ===
=== Test run 7, at 1843200 bits/s completed after 10.003 seconds, sent 376450 byte(s), received 376450 byte(s) (throughput 376450 bits/s with a threshold of 0 bits/s) ===
And to match your numbers, testing your code with release profile gives me:
=== Test run 1, at 9600 bits/s completed after 10.002 seconds, sent 8986 byte(s), received 8986 byte(s) (throughput 8986 bits/s with a threshold of 0 bits/s) ===
=== Test run 2, at 57600 bits/s completed after 10.001 seconds, sent 52637 byte(s), received 52637 byte(s) (throughput 52637 bits/s with a threshold of 0 bits/s) ===
=== Test run 3, at 115200 bits/s completed after 10.001 seconds, sent 105166 byte(s), received 105166 byte(s) (throughput 105166 bits/s with a threshold of 0 bits/s) ===
=== Test run 4, at 230400 bits/s completed after 10.001 seconds, sent 210039 byte(s), received 210039 byte(s) (throughput 210039 bits/s with a threshold of 0 bits/s) ===
=== Test run 5, at 460800 bits/s completed after 10.001 seconds, sent 384878 byte(s), received 384878 byte(s) (throughput 384878 bits/s with a threshold of 0 bits/s) ===
=== Test run 6, at 921600 bits/s completed after 10.001 seconds, sent 662255 byte(s), received 662255 byte(s) (throughput 662255 bits/s with a threshold of 0 bits/s) ===
=== Test run 7, at 1843200 bits/s completed after 10.001 seconds, sent 816232 byte(s), received 816232 byte(s) (throughput 816232 bits/s with a threshold of 0 bits/s) ===
One more update in my branch - I checked the Peripheral Clock frequency of the UART IP (on APB1 n my case) and it was not running at max speed. So with frequency increase that gives:
Test run 1, at 230400 bits/s completed after 10.001 seconds, sent 209522 byte(s), received 209522 byte(s) (throughput 209522 bits/s with a threshold of 115200 bits/s) ===
Test run 2, at 460800 bits/s completed after 10.001 seconds, sent 384900 byte(s), received 384900 byte(s) (throughput 384900 bits/s with a threshold of 230400 bits/s) ===
Test run 3, at 921600 bits/s completed after 10.003 seconds, sent 661061 byte(s), received 661061 byte(s) (throughput 661061 bits/s with a threshold of 460800 bits/s) ===
Test run 4, at 1843200 bits/s completed after 10.004 seconds, sent 905410 byte(s), received 905410 byte(s) (throughput 905410 bits/s with a threshold of 921600 bits/s) ==
Would those various optimizations match your need ?
@RobMeades so what is the difference between your 2 tests results - was it the debug profile VS. release profile ?
Yes, debug v release. Can you take a look at the system_clock.c
for our C030 target:
...and see if it looks correct to you?
I've being using UART4 for my tests. From my latest results it seems to only provide performance improvement when using 1843200 baudrate. But on a loaded system, like your final application, this may bring improvement I think, so you may want to consider it again in your final setup.
On F437, I think UART2/3/4/5 are all on APB1, so I would suggest to do the change and increase your UART frequency if you're looking for performances rather than power saving. At least worth a try with the modem during a real use case as mentioned in previous post.
Ooops ... my mistake !
I just checked with STM32CubeMX software and APB1 max allowed frequency is 45MHz, so please discard latest change, this is not allowed and might create unpredictable behavior. Too bad :-(
You could nevertheless try to use 180MHz SysCLk and 45MHz APB (instead of 168 / 42), ... in case you don't need USB on your device.
Shame, 'cos with that change it even managed to run the serial port at 3686400 without losing characters (throughput 887420). Maybe I need to run water cooling :-).
@kjbracey-arm just to answer an earlier post, in case of asynch communication the TXE interrupt is already being used, though this is hidden in STM32 HAL sdk. The performances will indeed be very near what we observe now in my proposed branch.
// the following function will enable UART_IT_TXE and error interrupts
if (HAL_UART_Transmit_IT(huart, (uint8_t*)tx, tx_length) != HAL_OK)
@RobMeades I've posted #4734 for fixing the serial driver. @kjbracey-arm I'll let you and @RobMeades review and decide whether to apply in your UARTserial branch the changes I proposed here https://github.com/LMESTM/mbed/tree/test_uartserial_flow
AFAICT, those changes in UARTSerial are assuming exactly one interrupt per byte, and an interrupt being generated if a character is pending as the interrupt is attached - both of those could not be the case on other hardware. Difficult-to-retrigger edge-based interrupts are not uncommon, which then compounds with FIFO-based chips.
So that is one way an interrupt-driven asynch routine can be faster than generic code without actually being DMA - it can use its knowledge of a platform's precise IRQ behaviour.
To use that we'd need a special target "easy 1-at-time serial" flag to compile that alternative code. But if we're going to have alternative pump code, I'd rather go all the way and put in an asynch alternative. Gain lots on platforms with DMA too, rather than just a teeny bit on non-DMA non-FIFO platforms.
I've been doing some end to end testing with the #4734 changes (but with the TX interrupt still on TC
and not TXE
as that gives me more "buffer"). FYI, the application is to transmit 24 bit (mono) audio sampled at 16 kHz (captured from the I2S interface with this DMA driver, cloned from the ST one), which is sent over the air to a server in fixed length UDP frames of 488 bytes each every 20 ms. This results in an end-to-end data rate requirement of 195200 bits/s.
What I'm finding is that the system can't keep up, specifically sock->sendto()
of the Cellular API does not return fast enough to send 488 bytes in 20 ms. If I reduce the UDP frame size to 328 bytes (which is 16 bit audio) then it does keep up. This is with the serial port running at 460800 bits/s and, obviously, with flow control on. The modem is not raising CTS, so this is not due to back-pressure from the modem, I believe the MCU+software system just doesn't have the grunt to feed the UART quickly enough.
Question is, would DMA on the serial port make the difference, or is all the PPP escaping (etc.) just, umm, an unescapable issue?
Description
Target STM
I'm dealing with an application on an ST platform in which the serial port is being run at quite a high rate (460800 bits/s). In order to receive characters reliably at the MCU with this kind of rate, HW flow control is being employed. While investigating some character loss I'm seeing in this scenario I noticed something odd: once the serial port buffer is sufficiently full for the RTS line to be raised, it remains in that state; the RTS line is raised after every character, like the serial port's HW buffer is never being emptied.
Looking at the code from
UARTSerial()
down:UARTSerial::rx_irq()
callsSerialBase::_base_getc()
in a loop whileSerialBase::readable()
istrue
.serial_readable()
(intargets/TARGET_STM/serial_api.c
) just checks for the flagUART_FLAG_RXNE
being set for the UART.uart_irq()
intargets/TARGET_STM/TARGET_STM32F4/serial_device.c
) clears theUART_FLAG_RXNE
flag.So if the interrupt has gone off and flow control has stopped any more characters being received, the flag will not be set and
UARTSerial::rx_irq()
won't go around its loop at all, which would lead to the buffer never being emptied. I think...?Aside from the fact that this would be inefficient, I wonder if it might be implicated in the character loss I'm seeing.
Steps to reproduce Run serial port via
UARTSerial
class (e.g. the new Cellular API) on STM part at a data rate high enough that the interrupt can't be serviced within the usual interrupt latency (e.g. 460800 bits/s) and watch how quickly RTS is raised once a few 10's of characters have been received.cc: @kjbracey-arm , @hasnainvirk , @adustm , @LMESTM , @jeromecoutant