Closed roman-dvorak closed 2 years ago
Thanks for the detailed report. We're currently investigating this as a blocking issue for the next release (v1.13).
Hi, I've been trying to make more attempts to get to the state where you can log.
Subsequently, in nuttx dmesg was:
In this state, it is not possible to restart logger: It writes that it only records the mission log.
I was able to repeat this several times (4x). Some logs could be uploaded to review.px4.io and some not (without any error message). However, all could be displayed in the plotjuggler. Some I was unable to get due to a corrupted SD card.
log files, that ended "in-flight" (in armd state). 15_22_37.zip 15_41_42.zip
Top (in time of corrupted logger):
There is an high CPU usage. By log_writer_file
.
Sometimes I was not able to connect the card to the computer (ubuntu 20.04). With these errors:
Tested with these cards:
@roman-dvorak we may have a fix, could you please try https://github.com/PX4/PX4-Autopilot/pull/18687?
To be paranoid and convince myself the card and file system is in good shape I've been exhaustively testing all blocks. https://gist.github.com/dagar/8d002760ee1f8e76d1bd88c16a469d24#file-linux_sdcard_test_and_format-sh-L8
Booth SD cards look well. It passed the test without any error. Without any bad sector.
SanDisc Ultra 16 GB:
Object /org/freedesktop/UDisks2/block_devices/sdf is not a mountable filesystem. ***** BADBLOCK ***** Hledají se špatné bloky v režimu čtení i zápis Od bloku 0 do 15558143 Zkouším s náhodným vzorkem: hotovo Čtení a porovnání: hotovo Průchod dokončen, nalezeno 0 špatných bloků (0/0/0 chyb). 6.42user 15.65system 25:18.85elapsed 1%CPU (0avgtext+0avgdata 2672maxresident)k 31118848inputs+31116288outputs (12major+147minor)pagefaults 0swaps ***** F3PROBE ***** F3 probe 7.2 Copyright (C) 2010 Digirati Internet LTDA. This is free software; see the source for copying conditions. WARNING: Probing normally takes from a few seconds to 15 minutes, but it can take longer. Please be patient. Good news: The device `/dev/sdf' is the real thing Device geometry: *Usable* size: 14.84 GB (31116288 blocks) Announced size: 14.84 GB (31116288 blocks) Module: 16.00 GB (2^34 Bytes) Approximate cache size: 0.00 Byte (0 blocks), need-reset=no Physical block size: 512.00 Byte (2^9 Bytes) Probe time: 2'29" Operation: total time / count = avg time Read: 829.0ms / 4814 = 172us Write: 2'27" / 4192321 = 35us Reset: 0us / 1 = 0us 0.95user 0.67system 2:29.60elapsed 1%CPU (0avgtext+0avgdata 3404maxresident)k 5382inputs+4192321outputs (3major+362minor)pagefaults 0swaps
and the second SD (SanDisc Ultra 32 GB):
***** BADBLOCK ***** Hledají se špatné bloky v režimu čtení i zápis Od bloku 0 do 31166975 Zkouším s náhodným vzorkem: hotovo Čtení a porovnání: hotovo Průchod dokončen, nalezeno 0 špatných bloků (0/0/0 chyb). 8.24user 19.18system 32:30.84elapsed 1%CPU (0avgtext+0avgdata 3052maxresident)k 62334736inputs+62333952outputs (6major+157minor)pagefaults 0swaps ***** F3PROBE ***** F3 probe 7.2 Copyright (C) 2010 Digirati Internet LTDA. This is free software; see the source for copying conditions. WARNING: Probing normally takes from a few seconds to 15 minutes, but it can take longer. Please be patient. Good news: The device `/dev/sdf' is the real thing Device geometry: *Usable* size: 29.72 GB (62333952 blocks) Announced size: 29.72 GB (62333952 blocks) Module: 32.00 GB (2^35 Bytes) Approximate cache size: 0.00 Byte (0 blocks), need-reset=no Physical block size: 512.00 Byte (2^9 Bytes) Probe time: 1'18" Operation: total time / count = avg time Read: 710.5ms / 4815 = 147us Write: 1'16" / 4192321 = 18us Reset: 0us / 1 = 0us 0.53user 0.34system 1:18.14elapsed 1%CPU (0avgtext+0avgdata 3536maxresident)k 4815inputs+4192321outputs (0major+359minor)pagefaults 0swaps mkfs.fat 4.1 (2017-01-24) Auto-selecting FAT32 for large filesystem /dev/sdg1 has 64 heads and 32 sectors per track, hidden sectors 0x0800; logical sector size is 512, using 0xf8 media descriptor, with 62330880 sectors; drive number 0x80; filesystem has 2 32-bit FATs and 32 sectors per cluster. FAT size is 15232 sectors, and provides 1946887 clusters. There are 32 reserved sectors. Volume ID is 1b7d3c73, no volume label. Searching for bad blocks 203360... 423136... 634720... 841856... 1058784... 1276128... 1482976... 1702624... 1922784... 2138080... 2343904... 2563936... 2783968... 2988000... 3202272... 3421024... 3629792... 3845600... 4065120... 4280800... 4485216... 4703712... 4923104... 5132256... 5344224... 5564000... 5778912... 5981280... 6197856... 6418528... 6628704... 6839168... 7058400... 7274720... 7478496... 7697760... 7916256... 8122848... 8335200... 8553184... 8768224... 8979424... 9198176... 9416544... 9620832... 9839712... 10060640... 10270944... 10479456... 10699232... 10914784... 11116768... 11334112... 11554528... 11763808... 11973088... 12193632... 12411232... 12614112... 12834144... 13054304... 13263328... 13474272... 13694432... 13912288... 14120928... 14341088... 14560480... 14766304... 14984800... 15204576... 15416544... 15625312... 15845600... 16064736... 16267616... 16484064... 16704608... 16916064... 17122656... 17342048... 17562208... 17765984... 17982176... 18201824... 18409952... 18619360... 18837856... 19055328... 19262944... 19481952... 19701600... 19907552... 20124768... 20344544... 20557792... 20766176... 20985696... 21204832... 21408736... 21619936... 21832928... 22046816... 22253664... 22473824... 22695232... 22899936... 23114720... 23334112... 23546656... 23752288... 23969760... 24178016... 24389728... 24603488... 24822752... 25035360... 25246176... 25464672... 25683936... 25888480... 26104160... 26323936... 26534112... 26740832... 26960608... 27179616... 27395296... 27610080... 27829472... 28042848... 28249568... 28468832... 28688480... 28893280... 29109728... 29328096... 29536224... 29741280... 29960800... 30178784... 30386272... 30602720... 30821472... 31029344... 3.57user 33.08system 12:09.81elapsed 5%CPU (0avgtext+0avgdata 2124maxresident)k 62331040inputs+30528outputs (2major+2009minor)pagefaults 0swaps Model: Neznámý (unknown) Disk /dev/sdg1: 31,9GB Sector size (logical/physical): 512B/512B Tabulka oddílů: loop Disk Flags: Číslo Začátek Konec Velikost Systém souborů Přepínače 1 0,00B 31,9GB 31,9GB fat32 fsck.fat 4.1 (2017-01-24) Checking we can access the last sector of the filesystem Boot sector contents: System ID "mkfs.fat" Media byte 0xf8 (hard disk) 512 bytes per logical sector 16384 bytes per cluster 32 reserved sectors First FAT starts at byte 16384 (sector 32) 2 FATs, 32 bit entries 7798784 bytes per FAT (= 15232 sectors) Root directory start at cluster 2 (arbitrary size) Data area starts at byte 15613952 (sector 30496) 1946887 data clusters (31897796608 bytes) 32 sectors/track, 64 heads 2048 hidden sectors 62330880 sectors total Starting check/repair pass. Checking for bad clusters. Reclaiming unconnected clusters. Checking free cluster summary. Starting verification pass. Checking for unused clusters. /dev/sdg1: 0 files, 1/1946887 clusters 3.32user 25.88system 11:59.52elapsed 4%CPU (0avgtext+0avgdata 32112maxresident)k 62362336inputs+0outputs (4major+13403minor)pagefaults 0swaps
sd_bench and sd_stress pased multiple times:
nsh> nsh> nsh> sd_stress iteration 0 took 2369326 us: OK iteration 1 took 3534981 us: OK iteration 2 took 2271921 us: OK iteration 3 took 2345547 us: OK iteration 4 took 2330682 us: OK nsh> nsh> sd_bench -u -v INFO [sd_bench] Using block size = 4096 bytes, sync=0 INFO [sd_bench] INFO [sd_bench] Testing Sequential Write Speed... INFO [sd_bench] Run 0: 2032.37 KB/s, max write time: 12 ms (= 333.33 KB/s), fsync: 7 ms INFO [sd_bench] Run 1: 1874.72 KB/s, max write time: 67 ms (= 59.70 KB/s), fsync: 7 ms INFO [sd_bench] Run 2: 2014.84 KB/s, max write time: 49 ms (= 81.63 KB/s), fsync: 12 ms INFO [sd_bench] Run 3: 1846.66 KB/s, max write time: 67 ms (= 59.70 KB/s), fsync: 9 ms INFO [sd_bench] Run 4: 1964.14 KB/s, max write time: 47 ms (= 85.11 KB/s), fsync: 54 ms INFO [sd_bench] Avg : 1946.65 KB/s INFO [sd_bench] INFO [sd_bench] Testing Sequential Read Speed of 4913 blocks INFO [sd_bench] Run 0: 2970.85 KB/s, max read/verify time: 7 ms (= 571.43 KB/s) INFO [sd_bench] Run 1: 2891.55 KB/s, max read/verify time: 5 ms (= 800.00 KB/s) INFO [sd_bench] Run 2: 2962.37 KB/s, max read/verify time: 5 ms (= 800.00 KB/s) INFO [sd_bench] Run 3: 3083.00 KB/s, max read/verify time: 2 ms (=2000.00 KB/s) INFO [sd_bench] Avg : 2955.36 KB/s 4913 blocks read and verified nsh>
I merged your PR (#18687) into autogyro takeoff PR (#18582), which resulted into a commit: https://github.com/ThunderFly-aerospace/PX4Firmware/commit/252e4e94f6e07d8d155a628cd67d1b08c8525398. Everything looks good during these bench tests. But I can't confirm it, right now. I tried to do the same operations as in the previous attemt. And with each attempt, a complete log was created. In order to confirm the functionality of these improvements, we must do some flight-test in real enviroment.
Thanks! I can confirm that this problem did not occur during the flight test.
Hi, we repeatedly have a problem with logging to the SD card. It happened to us several times that the card was corrupted during the inflight logging. The SD card or filesystem during read by computer seems to be flagged as read-only. And the log file was not complete.
For the first time, it happened to us on 7. 10. 2021 with takeoff mode (#18582), which was based on a few days old PX4 master version. At that time, I thought it was due to a large number of debug text logs (
PX4_INFO("...")
). In the meantime, we tried to use a new (32GB SD card)[], format the card (completely erased) to FAT32. And the problem was repeated several times. Therefore we didn't have a log that we really needed.. It does not matter that we create a smaller 4GB FAT32 volume on SDcard. The situation was almost the same, corrupted filesystem with missing logs and sometimes other files which were previously on SDcard.Next time, I limited number of
PX4_INFO
messages to the minimum number of these logs. And the SD card was corrupted again. With such FW. But it was the Firmware with my modifications.The first used card was:
After a debate in (#18419), we decided to test this “airspeed mode” PR with our second autogyro (FMU v5, CUAV NANO) and this was almost unmodified FW (f814a52). But during the flight, the SDcard filesystem corruption happened again. This piece of autogyro (the same autopilot, the same SDcard, ..) was previously used in the same configuration with several months old master FW. Without any signs of this problem.
This always happened after a few seconds of flight. I don’t see any clear cause for this behaviour in the same fragments of the logs, retrieved from SDcard after fsck recovery, containing the takeoff
I tried to repeat this on the ground. But I was unsuccessful. Logging worked for a very long time.
From these experiments, I conclude that this is a problem in Firmware (I don’t know at what level).
During the first attempts (when this issue occurred) I used the internal reader of my laptop. During the next experiments, I used an external uSD card reader from Kingstone. But the result was the same.