Closed davids5 closed 7 months ago
I have have also seen where gps driver on ports that have TXDMA will not work. (v6xrt)
I am seeing the same thing on the ARKV6X.
I think the TXDMA semaphore change is wrong and should be reverted the way it was.
The write()
function by posix definition can block indefinitely.
https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
The issue #22302 tried to fix was not a problem in the NuttX kernel. But a problem in the mavlink application itself since it opens the uart in blocking mode. https://github.com/PX4/PX4-Autopilot/blob/62027b09655c63dd71808481c30ed397fc25aa52/src/modules/mavlink/mavlink_main.cpp#L587
My proposed solution is that mavlink open the uart using O_NONBLOCK
instead. And should handles the EAGAIN return code.
Can someone point me to a commit where things are still sane?
Edit: looks like main is ok again?
But then should the implementation not be:
up_dma_txavailable
should get a return value and the NuttX serial upper driver should be extended as well to feedback EAGAIN
to userspace.
https://github.com/PX4/NuttX/blob/e7da5ac0e660238cb353948b2a9118579727267a/drivers/serial/serial.c#L293
https://github.com/PX4/NuttX/blob/e7da5ac0e660238cb353948b2a9118579727267a/drivers/serial/serial.c#L494
https://github.com/PX4/NuttX/blob/e7da5ac0e660238cb353948b2a9118579727267a/drivers/serial/serial.c#L1261
Edit: Maybe we don't even need EAGAIN but I'm afraid we lose data if we don't. Need some more investigation.
I need to review the sequence for the Transmit.
I have a nuttx branch with
The stm32h7 fix: https://github.com/PX4/NuttX/commit/c696d5e1b0a98f1d976467576f6d93ff8ce61423 The imx: fix: https://github.com/PX4/NuttX/commit/2de00138371eb634de22fa708234006d78b2ea13 https://github.com/PX4/NuttX/commit/52c5524a44a7ff4c33d33865dc312510adea21a5 https://github.com/PX4/NuttX/commit/8cae36b737e0d3156691781fdfcfb17af8c26e41
The console requiring a second key press is fixed. serial_test runs at 3Mbps and with https://github.com/PX4/PX4-GPSDrivers/pull/143 the imxrt can use TX DMA again.
@niklaut can you pull in the stm32h7 and test the original issue with mavlink?
As @dakejahl mentioned, this issue is also heavly affecting the uxrce_dds_client. After #22302 the RTT of the xrce-dds communication with a companion computer increased from 1.5ms to 20ms (testing on a Raspberry Pi 4, serial hardware on the raspberry (serial5), CUAV pixhawk 6x, TELEM1 used on the CUAV at 921600b/s). 20ms of RTT makes the time synchronization impossible.
@davids5 , I tested your nuttx branch and it definitively improves but it does not remove entirely the issue as the RTT is now around 6ms.
As @dakejahl mentioned, this issue is also heavly affecting the uxrce_dds_client. After #22302 the RTT of the xrce-dds communication with a companion computer increased from 1.5ms to 20ms (testing on a Raspberry Pi 4, serial hardware on the raspberry (serial5), CUAV pixhawk 6x, TELEM1 used on the CUAV at 921600b/s). 20ms of RTT makes the time synchronization impossible.
@davids5 , I tested your nuttx branch and it definitively improves but it does not remove entirely the issue as the RTT is now around 6ms.
@beniaminopozzan am I following that if you just revert the H7 commit that created the problem you get some timing that 1.5ms. With that commit it goes to to 20 and then with px4_firmware_nuttx-10.3.0%2B-serial-fixes-testing it is 6ms?
As @dakejahl mentioned, this issue is also heavly affecting the uxrce_dds_client. After #22302 the RTT of the xrce-dds communication with a companion computer increased from 1.5ms to 20ms (testing on a Raspberry Pi 4, serial hardware on the raspberry (serial5), CUAV pixhawk 6x, TELEM1 used on the CUAV at 921600b/s). 20ms of RTT makes the time synchronization impossible. @davids5 , I tested your nuttx branch and it definitively improves but it does not remove entirely the issue as the RTT is now around 6ms.
@beniaminopozzan am I following that if you just revert the H7 commit that created the problem you get some timing that 1.5ms. With that commit it goes to to 20 and then with px4_firmware_nuttx-10.3.0%2B-serial-fixes-testing it is 6ms?
@davids5 , that's correct!
@beniaminopozzan can you describe the setup (wiring and commands) so I can reproduce the issue?
@davids5 Of course
5.15.98-rt62-raspi
.v2.4.1
uart4
on the Raspberry, adding dtoverlay=uart4
to /boot/firmware/config.txt
Parameter | value |
---|---|
UXRCE_DDS_CFG | TELEM 1 |
MAV_0_CONFIG | Disabled |
SER_TEL1_BAUD | 921600 |
The CUAV TELEM1
is directly connected to uart4
of the Raspberry. The Rasberry RX4 is on PIN21 while TX4 is on PIN24.
MicroXRCEAgent serial --dev /dev/ttyAMA1 -b 921600
/fmu/out/timesync_status
ROS2 topic and look for the round_trip_time
field. It's value is in microseconds.Potentially, you can use any usb to serial adapter so that the raspberry is not neeeded. However I always found the quite unreliable.
@beniaminopozzan You're right, there are differences in timing even on an unconnected TELEM1: https://github.com/PX4/NuttX/pull/285#issuecomment-1825948853.
This issue has been mentioned on Discussion Forum for PX4, Pixhawk, QGroundControl, MAVSDK, MAVLink. There might be relevant details there:
https://discuss.px4.io/t/gps2-on-cube-orange-mini-mapping-possibly-incorrect/35512/6
@davids5 any news on this?
Looks like the two commits still haven't fixed this. Run it long enough and TX gets stuck.
I thought removing O_NONBLOCK from the dds client driver fixed it but it just looks like the whole driver locks up when the serial port freezes.
With O_NONBLOCK, the driver thinks its sending data but nothing gets received by the agent. RX into the client is still fully working.
Without O_NONBLOCK, the entire driver locks up.
I thought removing O_NONBLOCK from the dds client driver fixed it but it just looks like the whole driver locks up when the serial port freezes.
With O_NONBLOCK, the driver thinks its sending data but nothing gets received by the agent. RX into the client is still fully working.
Without O_NONBLOCK, the entire driver locks up.
How many threads is it? Maye you can walk me through that you have learned from uxrce_dds_client/uxrce_dds_client.cpp.
I thought removing O_NONBLOCK from the dds client driver fixed it but it just looks like the whole driver locks up when the serial port freezes. With O_NONBLOCK, the driver thinks its sending data but nothing gets received by the agent. RX into the client is still fully working. Without O_NONBLOCK, the entire driver locks up.
How many threads is it? Maye you can walk me through that you have learned from uxrce_dds_client/uxrce_dds_client.cpp.
@AlexKlimaj can you scope TX, RX and the RTS, CTS lines at the point of failure and post it here?
I'm testing using mavlink now instead of DDS. Running into the same disconnection problem. It will run for a while, then lock up.
I can scope next week but I just tested current main with telem 2 on mavlink and it barely runs for 10 minutes without disconnecting. Main with the dma commits reverted has been running for 4 hours
@beniaminopozzan The serial DMA issue was fixed with https://github.com/PX4/PX4-Autopilot/pull/22667. Please test current master. If this is still and issue please reopen this or post a new issues with what you are seeing.
Describe the bug
With stm32h7/serial: Do not wait on TXDMA semaphore in, the console will not complete transmitting the complete output until a character is input on console.
gps status
\< hit a key >
To Reproduce
see above
Expected behavior
not extra input needed
Screenshot / Media
No response
Flight Log
NA
Software Version
main with nuttx that has ed4814f6239097dde5eecf2b4fbd58661db84dda
Flight controller
all H7 with TX DMA
Vehicle type
None
How are the different components wired up (including port information)
No response
Additional context
No response