FrSkyRC / ETHOS-Feedback-Community

Feedback & suggestions are welcomed here for ETHOS by FrSky
188 stars 85 forks source link

File system corruption #4603

Open robthomson opened 6 days ago

robthomson commented 6 days ago

I have been struggling for some time with random file system corruption on the ethos eMMC and sdcard.

Most notable is that I do allot of rsync to the sdcard cycling power from ethos suite.

Randomly I will get a eMMC write error and the drive gets corrupted.

Today I rewrote my code syncing scripts to not use cygwin / rsync, but rather using native windows copy commands.

Interestingly there has not been a single corruption today.

This makes me think the issue is caused in some way by extended file attribute handling causing ethos to get upset.

I believe this issue also occurs for users on linux.

rburrow87 commented 6 days ago

I see these issues too, particularly with all the Lua updating with Rob's RF scripts :) I have been using Linux much more lately and have found it's very unreliable to update Lua scripts in -- requiring me to use Windows in a virtual machine to do the file transfer most of the time.

It doesn't always have problems, but it's very frequent and generally won't clear up without using Windows once it starts doing it. I've had it happen with both internal storage and SDCard, doesn't matter which. And it doesn't matter if I transfer to the SDCard through the radio in bootloader, through the radio in Ethos, or directly to the SDCard with a card reader.

I can look at the files in the file browser in Ethos and they'll show reasonable timestamps, unmount and remount in Linux and they look correct/updated, and even look at them in Windows and they are correct. Yet Ethos will somehow bring up what seems like an old version of one or more of the lua files, causing strange bugs. The only fix is to delete and recopy (or overwrite) using Windows.

Another odd thing that happens is if I delete the /scripts/rfsuite folder and re-copy in Linux (using either rm and cp in the terminal or through a file browser like Nautilus, makes no difference) it will get moved up to the beginning of the list of Lua scripts in the system page even though it's the one with the latest change.

I have done plenty of file transfers to and from FAT partitions for other things, including EdgeTX radios, and don't encounter behavior like this.

bsongis-frsky commented 3 days ago

I have performed tests the whole morning with my X20-PRO and I cannot reproduce it. I have done a lot of read / write on the eMMC! Do you have a message on Linux side (in dmesg)?

rburrow87 commented 3 days ago

I will try to find a way to reliably reproduce the issue I see and report back.

I'm hoping I can take something big like Rob's RFSUITE script and just change the version number on the computer, then copy it to the radio, and check what version it reports. Then keep changing it and copying it until it stops updating. I have seen that happen when updating to test it.

I don't think there will be any errors from dmesg, but I will check that too. It's only Ethos that becomes out of sync.

bsongis-frsky commented 3 days ago

Rob showed some eMMC errors. There should be errors in dmesg in this case. But I have copied a lot yesterday (many times the whole audio packs). No issue until now. But perhaps you have an eMMC which has less available space than mine. I will do more testing later today too.

bsongis-frsky commented 15 hours ago

Reproduced here!

bsongis-frsky commented 15 hours ago

Bug in Ethos Suite! DiskIoControl(EJECT) may fail on first attempt when something is still not flushed. It is needed to retry it after it has failed, one second later!

bsongis-frsky commented 14 hours ago

Ah and bug for me. The Inactivity Timer should be reset when the radio switches ON / OFF the debug mode!

Nicholas-Luoyi commented 6 hours ago

I will upload a new installer for Windows in tag Suite 1.5.11 with the fix according to @bsongis-frsky 's help. Please take some time to have a try.