apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.74k stars 1.14k forks source link

stm32h7 UART drain problem #7138

Open slorquet opened 2 years ago

slorquet commented 2 years ago

Hi, The stm32h7 does not properly drain the UARTs on close().

The console works perfectly and hides the problem, because nsh keeps the UART opened indefinitely.

However, a custom program that sends data to an UART sporadically (like the built in command echo or an application that open/write/close) shows this important issue.

echo does not even emit anything if I write less than 16 characters, which happens to be half the DMA buffer / cache line size (no idea if this is related). This is not a buffering issue as sending several small chunks in succession does not lead to the delayed emission of a large block.

in my program, cfmakeraw does not change anything. calling tcdrain before close does nothing either.

Inserting a one second delay after a serial write ( with sleep(1) ) allows my output to be written. Leaving the UART open in my program does not help, since the devices are closed after the program ends. This also happens in the echo command.

Disabling DMA peripherals and options entirely has no effect and I dont think it's wise since the serial driver and others parts of the stm32h7 serial code clearly assumes DMA is enabled.

I wonder if there are bugs in the stm32h7 driver, specifically related to DMA or something else, or if the issue is known, or if I missed some config options.

Any help investigating this is highly appreciated, since the serial driver is surprisingly complex and I do not know how to start debugging this dynamic issue.

It can probably be reproduced in ANY stm32h7 board with a secondary uart and the echo command.

Thanks

acassis commented 2 years ago

Hi @slorquet the H7 drivers derives from F7, so I should starting comparing the Reference Manual from both chips to spot some differences. I know a company here in Brazil that is using STM32H743 with two serials, only for console and other with a serial fingerprinter (Secugen U20) and the fingerprinter library open and close the serial all the time and it works fine. Please retest using a NuttX version from middle of 2021 and verify whether the issue exist or not on that version. Probably it was introduced with some recent modification.

slorquet commented 2 years ago

It's quite related to accurate implementation details, eg if an app sends a frame and waits for an answer, this wait will allow the transmission.

My case is touchy since I found the bug by using "fire and forget", I send a frame and dont wait for any answer before closing,

Picking differences from reference manuals will be tedious. I'll try that later.

Do you recommend a particular mid-2021 release to test?

Thanks.

acassis commented 2 years ago

Hi @slorquet sorry my delay, I was really busy last week preparing the NuttX Workshop... I asked them about it and they suggest you to try reseting git nuttx repo to position commit 19beb307dd5023c2bf414dc54c3141bfc95c7251 and apps to c222043ed1ede9ef8e9e93ffed14ee9c41c4a2b1 Their project started from there!

slorquet commented 2 years ago

Hi, thank you for this. I'll have a try, but we are quite busy to make things work.

The workaround consisting of just waiting for a reply helped, I get proper communication, but the bug is still there when you broadcast data in a "fire and forget" mode.

jerpelea commented 1 year ago

Hi @slorquethttps://urldefense.com/v3/__https:/github.com/slorquet__;!!JmoZiZGBv3RvKRSx!7anlEGW9HhZz5Um-coXutgI3V586d1CKOjhQENBtbIhhAyNmq2t-bVyRRVjN9Hms6sSxxFgGlVcZ0byGsRQ8v7_dig$,

10.0 or 10.1 should be good candidates for this test

Best regards Alin

From: slorquet @.> Sent: den 20 september 2022 14:09 To: apache/incubator-nuttx @.> Cc: Subscribed @.***> Subject: Re: [apache/incubator-nuttx] stm32h7 UART drain problem (Issue #7138)

It's quite related to accurate implementation details, eg if an app sends a frame and waits for an answer, this wait will allow the transmission.

My case is touchy since I found the bug by using "fire and forget", I send a frame and dont wait for any answer before closing,

Picking differences from reference manuals will be tedious. I'll try that later.

Do you recommend a particular mid-2021 release to test?

Thanks.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/apache/incubator-nuttx/issues/7138*issuecomment-1252262855__;Iw!!JmoZiZGBv3RvKRSx!8SxKNe0D3lm0Y7QcJl6NP8BAXZR_h8Pl4bPN5u4AUx2nkLQVGZWYa91G4pQ9jro1xmcG7bR18zltGu2TpLnkXCRADw$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AB32XCRXQGMMFEWZFJ553CTV7GSPDANCNFSM6AAAAAAQQ5VGEE__;!!JmoZiZGBv3RvKRSx!8SxKNe0D3lm0Y7QcJl6NP8BAXZR_h8Pl4bPN5u4AUx2nkLQVGZWYa91G4pQ9jro1xmcG7bR18zltGu2TpLnS58VwNA$. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

slorquet commented 1 year ago

Hi,

I'm already on :

nsh> uname -a NuttX 10.3.0-RC0 40144a652d

The bug has disappeared thanks to our programming tricks. I have no time to test again.

But the echo shell command shows this behaviour easily if needed.

I think the device->close() operation incorrectly flushes the tx queue instead of waiting for drain.

Sebastien

Le 11/10/2022 à 09:02, Alin Jerpelea a écrit :

Hi @slorquethttps://urldefense.com/v3/__https:/github.com/slorquet__;!!JmoZiZGBv3RvKRSx!7anlEGW9HhZz5Um-coXutgI3V586d1CKOjhQENBtbIhhAyNmq2t-bVyRRVjN9Hms6sSxxFgGlVcZ0byGsRQ8v7_dig$,

10.0 or 10.1 should be good candidates for this test

Best regards Alin

From: slorquet @.> Sent: den 20 september 2022 14:09 To: apache/incubator-nuttx @.> Cc: Subscribed @.***> Subject: Re: [apache/incubator-nuttx] stm32h7 UART drain problem (Issue #7138)

It's quite related to accurate implementation details, eg if an app sends a frame and waits for an answer, this wait will allow the transmission.

My case is touchy since I found the bug by using "fire and forget", I send a frame and dont wait for any answer before closing,

Picking differences from reference manuals will be tedious. I'll try that later.

Do you recommend a particular mid-2021 release to test?

Thanks.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/apache/incubator-nuttx/issues/7138*issuecomment-1252262855__;Iw!!JmoZiZGBv3RvKRSx!8SxKNe0D3lm0Y7QcJl6NP8BAXZR_h8Pl4bPN5u4AUx2nkLQVGZWYa91G4pQ9jro1xmcG7bR18zltGu2TpLnkXCRADw$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AB32XCRXQGMMFEWZFJ553CTV7GSPDANCNFSM6AAAAAAQQ5VGEE__;!!JmoZiZGBv3RvKRSx!8SxKNe0D3lm0Y7QcJl6NP8BAXZR_h8Pl4bPN5u4AUx2nkLQVGZWYa91G4pQ9jro1xmcG7bR18zltGu2TpLnS58VwNA$. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

— Reply to this email directly, view it on GitHub https://github.com/apache/incubator-nuttx/issues/7138#issuecomment-1274180072, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVEVNMG2OLDFHTWLCO53ILWCUGJ7ANCNFSM6AAAAAAQQ5VGEE. You are receiving this because you were mentioned.Message ID: @.***>

slorquet commented 1 year ago

Hi,

using the latest nuttx the problem is there, but in a subtly different use case. It happens when exchanging several commands with my device.

The dialog is a succession of exchanges, I send a command and wait for answer. I observe that frame reception is called twice in rapid succession as if the write was instantaneous, and without effect. When syslog() messages are emitted at specific places within the dialog, everything works, because the delays are enough to let the data out.

it seems that write() is NOT blocking waiting for the end of current data to send, and data is lost by a read() call. This is similar to the original issue where the work done by write() was canceled at uart close.

Could this be related to cancellation points or things like that, that were added recently? These are not enabled in my build but it could be still be a cause for bugs.

the write() definitely has a buffering problem and I have no idea where. I am using tcdrain in the command transmission and no flush, the uart is opened in blocking mode.

edit: the magical solution is to insert a call to usleep(1) after every select() call used for reading :(