eclipse-threadx / levelx

Eclipse ThreadX - LevelX Provides Flash Wear Leveling for FileX and Stand Alone purposes.
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/levelx/index.md
MIT License
94 stars 57 forks source link

lx_nor_flash_free_physical_sectors == 0, reclaim not possible although there is space on the device #48

Open Iktek opened 1 week ago

Iktek commented 1 week ago

Dear List, i got an issue with LevelX 6.1.7 ( ) used in conjunction with FileX + ThreadX on an STM32H563 connected to a serial Flash via a custom driver.

Describe the bug

I somehow managed it to come into a situation where lx_nor_flash_free_physical_sectors == 0, which means that a write may end in an (nearly endless) loop, because the block may not be reclaimed ( which is tried until lx_nor_flash_total_block only but will continue again on next write-try ). The Flash-Structure still shows 276 obsolete sectors ( see nor_flash struct on the bottom of this page )

I'm just wondering if this can happen because of some erroneous condition from outside of levelX which is not catched gracefully or if i caught a bug in levelX code here.

I found out that the while loop checking if a reclaim is to be made is only existing in "lx_nor_flash_sector_write.c" @ line 114, which in my opinion would mean that it's possible to reach a lx_nor_flash_free_physical_sectors of 0 by reading sectors that are not mapped ( read routine also allocates sectors in lx_nor_flash_sector_read.c @ line 140 ) .

If this happened sometime before the reclaim routine may walk through all the blocks trying to reclaim but will fail each time because there are no physical blocks to copy the content to. This may also be checked in advance to not block the system all the time.

I also had a short look at the current development code if some of the issues may be fixed now, but i did not find fixes here up to now ( maybe I've overlooked something )

To Reproduce Hard to reproduce. This may only being reproduced on a rather small serial flash where writes and reads are done randomly over a longer time reaching the end of the flash, having no physical sectors free any more. I'll continue try reproducing in the next days and update the issue when there are more clear steps to do so.

Expected behavior I excpect that the levelX does not run into the situation when there are no free sectors any more before a reclaim takes place. ( except filesystem is really full and fx routines will therefore fail directly ) May this be fixed by introducing another reclaim-while-loop to the read routine, or are there sill other pitfalls?

I also expect that the reclaim will fail when there are not enough free sectors outside of the block to reclaim (checked in lx_nor-flash_block_reclaim.c @ line 238) which then should break the reclaim loop in the write routine to not block the whole system.

Impact Showstopper -> Filesystem unuseable

Logs and Console Output

On a sector write we try to reclame same erase_block all the time:

_lx_nor_flash_sector_write: free phys: 0 sect/block: 7 _lx_nor_flash_block_reclaim: erase_block: 483, erase_count: 8, mapped_sectors: 1, obsolete_sectors: 6 _lx_nor_flash_block_reclaim: erase_block: 483, erase_count: 8, mapped_sectors: 1, obsolete_sectors: 6 _lx_nor_flash_block_reclaim: erase_block: 483, erase_count: 8, mapped_sectors: 1, obsolete_sectors: 6 _lx_nor_flash_block_reclaim: erase_block: 483, erase_count: 8, mapped_sectors: 1, obsolete_sectors: 6 _lx_nor_flash_block_reclaim: erase_block: 483, erase_count: 8, mapped_sectors: 1, obsolete_sectors: 6 ... // this runns nearly endless

Additional Context The Filesystem still shows 968 kB of free space on a 4M serial-flash.

Contents of LEVELX nor-flash struct:

lx_nor_flash_state ULONG 1313821263
lx_nor_flash_total_blocks ULONG 1024
lx_nor_flash_words_per_block ULONG 1024
lx_nor_flash_total_physical_sectors ULONG 7168
lx_nor_flash_physical_sectors_per_block ULONG 7
lx_nor_flash_base_address ULONG 0x0 lx_nor_flash_block_free_bit_map_offset ULONG 3
lx_nor_flash_block_bit_map_words ULONG 1
lx_nor_flash_block_bit_map_mask ULONG 127 lx_nor_flash_block_physical_sector_mapping_offset ULONG 4
lx_nor_flash_block_physical_sector_offset ULONG 128 lx_nor_flash_free_physical_sectors ULONG 0
lx_nor_flash_mapped_physical_sectors ULONG 6892
lx_nor_flash_obsolete_physical_sectors ULONG 276 lx_nor_flash_minimum_erase_count ULONG 3
lx_nor_flash_maximum_erase_count ULONG 9
lx_nor_flash_free_block_search ULONG 0
lx_nor_flash_found_block_search ULONG 997 lx_nor_flash_found_sector_search ULONG 0
lx_nor_flash_write_requests ULONG 0
lx_nor_flash_read_requests ULONG 42
lx_nor_flash_sector_mapping_cache_hits ULONG 22
lx_nor_flash_sector_mapping_cache_misses ULONG 20
lx_nor_flash_physical_block_allocates ULONG 0
lx_nor_flash_physical_block_allocate_errors ULONG 0
lx_nor_flash_diagnostic_system_errors ULONG 0
lx_nor_flash_diagnostic_system_error ULONG 0
lx_nor_flash_diagnostic_initial_format ULONG 0
lx_nor_flash_diagnostic_erased_block ULONG 0
lx_nor_flash_diagnostic_re_erase_block ULONG 0
lx_nor_flash_diagnostic_sector_being_obsoleted ULONG 0
lx_nor_flash_diagnostic_sector_obsoleted ULONG 0
lx_nor_flash_diagnostic_mapping_invalidated ULONG 0
lx_nor_flash_diagnostic_mapping_write_interrupted ULONG 0
lx_nor_flash_diagnostic_sector_not_free ULONG 0
lx_nor_flash_diagnostic_sector_data_not_free ULONG 0
lx_nor_flash_driver_read UINT (
)(ULONG , ULONG , ULONG) 0x803208d
lx_nor_flash_driver_write UINT ()(ULONG , ULONG , ULONG) 0x80320f9 lx_nor_flash_driver_block_erase UINT ()(ULONG, ULONG) 0x80321dd
lx_nor_flash_driver_block_erased_verify UINT ()(ULONG) 0x8032165
lx_nor_flash_driver_system_error UINT (
)(UINT) 0x0 lx_nor_flash_sector_buffer ULONG 0x20042c14 <g_fx_serial_flash+9412> lx_nor_flash_sector_mapping_cache_enabled UINT 1
lx_nor_flash_sector_mapping_cache LX_NOR_SECTOR_MAPPING_CACHE_ENTRY [16] 0x200000e0 <fx_lx_nor_drivers+172>
lx_nor_flash_extended_cache_entries UINT 0
lx_nor_flash_extended_cache LX_NOR_FLASH_EXTENDED_CACHE_ENTRY [8] 0x200001a4 <fx_lx_nor_drivers+368>
lx_nor_flash_extended_cache_hits ULONG 0
lx_nor_flash_extended_cache_misses ULONG 0
lx_nor_flash_mutex TX_MUTEX {...}
lx_nor_flash_open_next struct LX_NOR_FLASH_STRUCT
0x20000034
lx_nor_flash_open_previous struct LX_NOR_FLASH_STRUCT * 0x20000034

LevelX, ThreadX, FileX coming from STM32Cube_FW_H5_V1.2.0

LX_NOR_SECTOR_SIZE (512/sizeof(ULONG))

FX is initialzed as follows:

define SF_SECTOR_SIZE (LX_NOR_SECTOR_SIZE * sizeof(ULONG))

define SF_NUM_FATS (1)

define SF_DIR_ENTRIES (32)

define SF_HIDDEN_SECTORS (0)

define SF_SECTORS_PER_CLUSTER (8)

define SF_HEADS (1)

define SF_SECTORS_PER_TRACK (1)

status = fx_media_format(&fsf->media, // nor_simulator_flash_disk pointer fx_stm32_levelx_nor_driver, // Driver entry (void)NOR_CUSTOM_DRIVER_ID, // Device info pointer (UCHAR ) fsf->media_mem, // Media buffer pointer SF_SECTOR_SIZE, // Media buffer size SF_VOLUME_NAME, // Volume Name SF_NUM_FATS, // Number of FATs SF_DIR_ENTRIES, // Directory Entries SF_HIDDEN_SECTORS, // Hidden sectors (flash_size / SF_SECTOR_SIZE), // Total sectors SF_SECTOR_SIZE, // Sector size SF_SECTORS_PER_CLUSTER, // Sectors per cluster SF_HEADS, // Heads SF_SECTORS_PER_TRACK);

Iktek commented 1 week ago

this may also be related: https://github.com/eclipse-threadx/levelx/issues/19