PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
8.17k stars 13.36k forks source link

[Bug] Logger issue when logging for multiple GB of data #22336

Open lvanasse opened 10 months ago

lvanasse commented 10 months ago

Describe the bug

Hi, I'm having an issue where the logger stop logging after a certain amount of time. I couldn't see an error from logger module in the dmesg information, and also it just stopped event if I had merged this PR which mitigate the limitation of NuttX max file of 2.0Go: https://github.com/PX4/PX4-Autopilot/pull/21389 and being on v1.14 release.

Here's a forum post regarding that issue: https://discuss.px4.io/t/px4-community-q-a-november-08-2023/35087/2?u=ludovicvanasse

And I've chatted a bit with @bkueng in his PR thread: https://github.com/PX4/PX4-Autopilot/pull/21389

I think part of my problem is the vfat is too small on the SD Card: image

My hypothesis is that the logging will fail when it reaches the maximum vfat size. It's not too bad, but it doesn't seem consistent with different SD card, I formatted a 32Go card, and it had less than 2Go of vfat (checking it at the moment for badblocks).

To Reproduce

Here's the setup I have, I'm using a Holybro v6C model with the included SD card. I have some sensor connected to it. And also have the logging at boot parameter configured.

And I just let it run for multiple hours. I'll come and do a logger status command once in a while to check if the logger is still running.

Expected behavior

After multiple hours or after the logged file size reach the max vfat size it will stop without, ironically, logs saying that it had an error.

Screenshot / Media

Here's the df -h command of a freshly formatted 8Gb SD card that came with the Holybro board.

image

Flight Log

Here's one of the log from my testing, although I don't think you'll find a lot of information from the logger there.

https://drive.google.com/file/d/1SoHqtEz1WzNLOFGnOAxKyXx3Qf3VMN2s/view?usp=sharing

Software Version

Version 1.14 with this PR cherry-picked: https://github.com/PX4/PX4-Autopilot/pull/21389 and with some private modification, but nothing that modified the logger code, except for new logged topics.

Flight controller

HolyBro v6C

Vehicle type

Airship/Balloon

How are the different components wired up (including port information)

No response

Additional context

No response

lvanasse commented 10 months ago

Hey @mrpollo, like you asked, I'm tagging you, so you can direct the best person for my problem to this issue.

mrpollo commented 10 months ago

Hey @bkueng / @davids5 have you ever seen any long-running log interrupted like this before?

lvanasse commented 10 months ago

Little update, so I formatted a 32Go SD card to be used with the PX4. I flash my firmware onto it and when running the df -h command on it, it returned this:

image

Which is about 50% less than the other card I've been using, which is 8Go.

I've checked the 32Go SD with the badblocks command, and I didn't find any issue.

image

So I'm really not sure what would decide that this SD card can only hold 1741MB of data.

lvanasse commented 10 months ago

Another update, so I left it running all night and the vfat got bigger. And I don't know why. So I'm pretty confused as to how it operates. Does it increase when it reaches the maximum size?

Another thing is that, previously while doing testing the size of the log were of about 2Go which was added to the documentation: https://github.com/PX4/PX4-user_guide/pull/2616/files

But my log is at 3.5Go, so the PX4 limit is probably the same as the NuttX one, which is 4Go.

image

And here's the same behaviour (where the vfat grows by itself) on another system (Pix32v6): image

davids5 commented 10 months ago

We suspect this us the issue, https://github.com/apache/nuttx/issues/8933#issuecomment-1804062917

But FS_LARGEFILE adds 4Kib of flash to the build that is so not needed.

lvanasse commented 10 months ago

Hi, there's more information coming in from my testing. But first, thank you, @davids5, for sharing this information.

First thing, is that the logger status command said that the logger wrote ~3.9GB, but there was just one log file, and it was just 2GB when I pull it out of the SD card.

So the max ULog file can only be of 2GB. But it means that the PR to fix the fail to write didn't create a new file. @bkueng any idea what might have happened?

Logging at the message log, we see that it kept using the same file:

image

And we get partially statistic information, where it tells us the file name that is full. But nothing after that. :disappointed:

Here's the link to the ULog file: https://drive.google.com/file/d/1smRfXpHZjRzO57JANw0E42Fc0jD-tzUK/view?usp=sharing

If you guys wants to go over it. It's a lot of the same information, but might find something I missed. It's also really big (2Go), so opening it with Plotjuggler or ulog_messages take a bit of time.

bkueng commented 10 months ago

I tested a bit and it turns out this is a bug in NuttX, and it's fixed upstream already in https://github.com/apache/nuttx/commit/d905a4e8a048e0f59e4. @davids5 do you want to backport of should I?

davids5 commented 10 months ago

@bkueng - I will grab it.....

lvanasse commented 9 months ago

Hi @bkueng and @bkueng I hope you guys had a good weekend. Last Friday, I tested the https://github.com/PX4/PX4-Autopilot/pull/22345 PR, and I let 2 PX4 ran for over 48 hours, but I did experience the same problem where the logs got stuck at 2Gb and didn't restart.

For the first one, it ran on a Pix32v6, and when I came back to check up on it. The MAVLink interface to communicate with it over nsh had lock-up. I wasn't able to interface with the system. I had to reboot it. And when I check in the log there was only a 2Gb one.

For the second test, it ran on a v6C, and it still had interaction, so talking to it over the nsh work, but the logger status said that it wasn't logging anymore. It only has a 1.6Gb log left. Which I assume ran into the limit of the SD card, at least that's what the df -h command says.

I'm thinking would there be debugging prints that would help with the logger? I can share with you the firmware, or the logs, please let me know how I can be most helpful since this is a pretty big issue for us, since we are planning on doing very long flights.

lvanasse commented 9 months ago

Hi @davids5, I'm not too sure why this issue has been closed. In my last comment, I said that I experiment the same issue as before. Did you do some test and were successful?

davids5 commented 9 months ago

The merge did it automatically

dishilov commented 9 months ago

Hello @lvanasse, all,

df -h command only shows incorrect information, and never had any effect on Nuttx operation for me, and without option -h df command shows correct block numbers and sizes. In other words, df -h command is not reliable indicator of volume information when FS_LARGEFILE is not enabled for volumes larger than 4GB. Use df and calculate using block numbers and sizes.

Without FS_LARGEFILE the limit for files in Nuttx is 2 GiB. I can only assume undefined system behavior if you keep writing to the same file past that, I haven't tried.

lvanasse commented 9 months ago

Hi @dishilov, thank you for the information.

I used the df -h command mainly to debug what was happening on the system when the logger was failing and to test the hypothesis that the SD card was full. But that's good to know, I guess I'll have to learn to do those quick computations in my head.

As for the file system limit, I was under the impression that the PR https://github.com/PX4/PX4-Autopilot/pull/21389 would achieve that without having to deal with modifying NuttX by itself.

But I guess I'll try to work around it for now and if I have time, I'll try to propose a fix similar to what @bkueng did in his PR.