Open smithBraun opened 2 years ago
Hi, I understand that it may be long time for investigation/solve this issue. So I will appreciate if you can update if you agree/disagree it is real issue, when you have option for solution to heare about it, and to get early drop of it.
@smithBraun , thanks for reporting the issue. We are working on reproducing the issue and will keep you updated.
HI @TiejunMS Thank you. Just want to mention that if one part of the FAT entry is 0 (not matter if it is the part in the first sector or in the second), there won't be issue.
Similar issue can happen when FAT chain is written when fx_utility_FATflush called by fx_utility_FAT_entrywrite (when _FX_FAULT_TOLERANT_STATE_SET_FATCHAIN)
Hi @TiejunMS , Any success with reproducing?
@smithBraun , did you encounter this issue by analysis or run into this issue in application? Here is my analysis on this issue.
Let's say the bytes per sector is 512 and sector per cluster is 1. On FAT12, each sector can hold 341 FAT entries. The original FAT chain of the file is as below. 700(3)->400(2)->800(3)->END The FAT entries of this file start from the third FAT sector, pointers to second FAT sector, then third sector.
When this file is deleted, in fx_fault_tolerant_cleanup_FAT_chain.c, all these three FAT entries will be cached and deleted from back to front. FAT entry 800 will be deleted first. Due to the sector of FAT entry 400 is different from 800, changes to FAT entries (from 800->END to 800->FREE) will be flushed to disk. If the power off happens before deleting FAT entry 400, the FAT chain will be like this. 700(3)->400(2)->800(3)->FREE
On next power on, we will do nothing to FAT entry 800 due to it is already freed. Only FAT entries 700 and 400 will be deleted.
after the power down when looking on the chain this entry may point on wrong place
I'm not sure about the entry pointing to wrong place. Did you mean FAT entry 400 still pointers to 800?
If this example is not suitable for the issue you described, could you share the FAT chain and where the power off happens during deleting the FAT chain?
@TiejunMS sorry for being not clear enough, I see you understand wrongly the bug I described.
did you encounter this issue by analysis or run into this issue in application I ran into this issue while running power down tests on FILEX
If this example is not suitable for the issue you described, could you share the FAT chain and where the power off happens during deleting the FAT chain?
Sure, let take your example of bytes per sector is 512 and sector per cluster is 1, I have two chains:
FAT(0x155) == 0x014->FAT(0x014) == 0xfff->END
FAT(0x010) == 0xfff->END
Looking at the entry sitting in 0x155, as 512 bytes sectors contain 0x155+1/3 FAT entries, so mapping the entries to sectors - this entry is separated into two, the 0x004 is in sector 1 and the 0x010 is in sector 2:
(1,2) FAT(0x155) == 0x014 ->(1) FAT(0x014) == 0xfff->END
(1) FAT(0x010) == 0xfff->END
Now let say the delete process of the first chain is beginning, from back to front as you mentioned, so first sector 1 will be updated so FAT entry 0x014 will be freed but entry 0x155 will be just partially updated!! :
(1,2) FAT(0x155) == 0x010 -> (1) FAT(0x010) == 0xfff->END
FAT(0x014) == 0x000
You can simulate the power down in - https://github.com/azure-rtos/filex/blob/89976978ff0ae62588e1871ea82fe05c67614c85/common/src/fx_utility_FAT_flush.c#L154-L155 Where the code detects place when FAT entry was separated to two and the first part written already.
@smithBraun , thanks for sharing the details! I confirm this is an issue and will come with a solution. I will keep you posted.
Great, thanks @TiejunMS . I will be happy to get the fix as soon as it implemented and not wait to official release, to re-run my tests and ensure I can't find more corner cases.
Hi @TiejunMS , Any updates with this issue?
@smithBraun , the fix is working in progress. Could you send an email to Azure RTOS support (azure-rtos-support@microsoft.com)? Once it is ready for test, I can share the source code with you.
background: Upon deleting file, there is deletion of the FAT chain of the file from the end to start (in _fx_utility_FAT_flush called by _fx_fault_tolerant_cleanup_FAT_chain). upon power down, the fault tolerance know just the beginning of the FAT chain, and it searching again till the end of it (which may be shorter now if before the power down it started to be deleted) and continue deleting from end to beginning.
the bug: in FAT12, FAT entries may be divided into 2 sectors, if the power down occur between writing one sector to the other, after the power down when looking on the chain this entry may point on wrong place, which will cause erasing another non related entries.