dlbeer / ufat

Low-memory feature-complete VFAT implementation
BSD 3-Clause "New" or "Revised" License
36 stars 12 forks source link

Question about "sectors per cluster" calculation in calculate_layout() and possible speed improvements for ufat_mkfs() #10

Closed FreddieChopin closed 5 years ago

FreddieChopin commented 5 years ago

https://github.com/dlbeer/ufat/blob/2d0ad69e56c44e9287adc489970b2c8fb8c6bae8/ufat_mkfs.c#L85

The comment and code try to choose the smallest possible value for FAT's "sectors per cluster". While I could not find any binding info which says you must do it one way or another, I think it should either be the opposite or at least user-configurable via additional argumnent for ufat_mkfs().

Here's the reasoning - for a 32 GB SD card (block size 512 bytes) uFAT calculates the FAT size (fat_bytes variable in calculate_layout()) to be a whooping 116 MB and then proceeds to write 232 MB of data (2 FAT tables) in 474624 separate writes to the device.

This takes forever...

While SD cards are pretty fast for large sequential writes, they are really slow when the writes are short. The clock for SD card in SDMMC mode (I'm not talking about SPI here) is very fast (25 MHz in my case, may be increased to 50 MHz) and the bus is pretty wide (4 bits), after each write there is a significant delay of up to 250 ms (usually well below 100 ms, but sometimes may really be 250 ms) when the card is busy writing the data. This results in a pretty low performance of ~314 blocks written per second (averaged during first ~12 seconds of operation, where i from init_fat32() was increased to just 1863, which means that twice as many blocks were written). Extrapolating that, it seems that I would have to wait for about half an hour for the operation to finish.

The first improvement here would be to actually use the largest possible (not the smallest possible) value of "sectors per cluster". Wikipedia suggests that it would be safe to use 32 kB (64 sectors per cluster), which would probably result in ~32x improvement.

When I format this SD card in Linux, the OS decides to use 32 sectors per cluster (1897567 clusters total). When I simulate format of such card with FatFs library, it selects the largest possible value - 64 sectors per cluster - which gives 949248 clusters total.

So finally my question - is there any reason why uFAT selects the smallest possible value instead of using the largest possible? If there is no such reason, would you accept a pull request which changes this behavior? If yes, then would you prefer this to be overridable by user or always the largest possible?

Second possibility for speedup would be to provide a working buffer for ufat_mkfs(). The larger the buffer, the smaller number of individual writes would have to be performed, which would also greatly improve performance. This would also avoid the use o VLA array in init_fat32().

For reference - http://elm-chan.org/fsw/ff/doc/mkfs.html

  1. it allows to configure the value of sectors per cluster, if not provided, largest possible value is used,
  2. it uses working buffer, which would improve performance as long as it is at least 2 block sizes, if not provided, tries to allocate 32 kB.
dlbeer commented 5 years ago

On Fri, Aug 23, 2019 at 03:28:54AM -0700, Freddie Chopin wrote:

https://github.com/dlbeer/ufat/blob/2d0ad69e56c44e9287adc489970b2c8fb8c6bae8/ufat_mkfs.c#L85

The comment and code try to choose the smallest possible value for FAT's "sectors per cluster". While I could not find any binding info which says you must do it one way or another, I think it should either be the opposite or at least user-configurable via additional argumnent for ufat_mkfs().

Here's the reasoning - for a 32 GB SD card (block size 512 bytes) uFAT calculates the FAT size (fat_bytes variable in calculate_layout()) to be a whooping 116 MB and then proceeds to write 232 MB of data (2 FAT tables) in 474624 separate writes to the device.

This takes forever...

While SD cards are pretty fast for large sequential writes, they are really slow when the writes are short. The clock for SD card in SDMMC mode (I'm not talking about SPI here) is very fast (25 MHz in my case, may be increased to 50 MHz) and the bus is pretty wide (4 bits), after each write there is a significant delay of up to 250 ms (usually well below 100 ms, but sometimes may really be 250 ms) when the card is busy writing the data. This results in a pretty low performance of ~314 blocks written per second (averaged during first ~12 seconds of operation, where i from init_fat32() was increased to just 1863, which means that twice as many blocks were written). Extrapolating that, it seems that I would have to wait for about half an hour for the operation to finish.

The first improvement here would be to actually use the largest possible (not the smallest possible) value of "sectors per cluster". Wikipedia suggests that it would be safe to use 32 kB (64 sectors per cluster), which would probably result in ~32x improvement.

When I format this SD card in Linux, the OS decides to use 32 sectors per cluster (1897567 clusters total). When I simulate format of such card with FatFs library, it selects the largest possible value - 64 sectors per cluster - which gives 949248 clusters total.

So finally my question - is there any reason why uFAT selects the smallest possible value instead of using the largest possible? If there is no such reason, would you accept a pull request which changes this behavior? If yes, then would you prefer this to be overridable by user or always the largest possible?

Second possibility for speedup would be to provide a working buffer for ufat_mkfs(). The larger the buffer, the smaller number of individual writes would have to be performed, which would also greatly improve performance. This would also avoid the use o VLA array in init_fat32().

For reference - http://elm-chan.org/fsw/ff/doc/mkfs.html

  1. it allows to configure the value of sectors per cluster, if not provided, largest possible value is used,
  2. it uses working buffer, which would improve performance as long as it is at least 2 block sizes, if not provided, tries to allocate 32 kB.

There is probably no good reason for the current behaviour. I think when it was first developed we were working with a fairly small boot partition on an SD-card, so nobody ever noticed the performance problem.

I'd be happy to take a pull request that automatically picked a larger cluster size. I don't think there's any reason to make it configurable at this stage. We'll see if anyone ever asks for it.

Cheers, Daniel

-- Daniel Beer dlbeer@gmail.com http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B

FreddieChopin commented 5 years ago

I have a few questions first, as I try to understand current logic.

  1. Is this loop correct for cases where the device has large blocks? https://github.com/dlbeer/ufat/blob/2d0ad69e56c44e9287adc489970b2c8fb8c6bae8/ufat_mkfs.c#L76 Say you have 4 kB blocks, but there are not so many of them. In this case the loop will decide to use 512 byte logical sectors (third condition is false), which will never work correctly. Shouldn't fl->log2_sector_size = 9; rather be fl->log2_sector_size = log2_block_size;?

  2. In the same loop there's a limit for sector size - less than 32 kB. https://github.com/dlbeer/ufat/blob/2d0ad69e56c44e9287adc489970b2c8fb8c6bae8/ufat_mkfs.c#L77 However this doc - https://www.win.tue.nl/~aeb/linux/fs/fat/fatgen103.pdf (which is basically an extended version of official MS documentation) - says that max sector size is 4 kB. I guess this value should be replaced with 4096 then, unless I'm misinterpreting things.

  3. If two items above should be changed, shouldn't there also be a condition at the beginning which returns an -UFAT_ERR_BLOCK_SIZE error if block size is above 4 kB? On the other hand - why the limitation about minimum block size being 512 B?

  4. Where does the value 0x10000000 come from? https://github.com/dlbeer/ufat/blob/2d0ad69e56c44e9287adc489970b2c8fb8c6bae8/ufat_mkfs.c#L94 I think there is no limitation on the number of clusters for FAT, so the only factor would be an uint32_t type used by uFAT to store this number, but if that's the case, should there be one more zero (and one more "f" in line 99)?

dlbeer commented 5 years ago

On Mon, Aug 26, 2019 at 02:51:33PM -0700, Freddie Chopin wrote:

I have a few questions first, as I try to understand current logic.

  1. Is this loop correct for cases where the device has large blocks? https://github.com/dlbeer/ufat/blob/2d0ad69e56c44e9287adc489970b2c8fb8c6bae8/ufat_mkfs.c#L76 Say you have 4 kB blocks, but there are not so many of them. In this case the loop will decide to use 512 byte logical sectors (third condition is false), which will never work correctly. Shouldn't fl->log2_sector_size = 9; rather be fl->log2_sector_size = log2_block_size;?

Yes, you are correct -- that's a bug.

  1. In the same loop there's a limit for sector size - less than 32 kB. https://github.com/dlbeer/ufat/blob/2d0ad69e56c44e9287adc489970b2c8fb8c6bae8/ufat_mkfs.c#L77 However this doc - https://www.win.tue.nl/~aeb/linux/fs/fat/fatgen103.pdf (which is basically an extended version of official MS documentation) - says that max sector size is 4 kB. I guess this value should be replaced with 4096 then, unless I'm misinterpreting things.

Also correct.

  1. If two items above should be changed, shouldn't there also be a condition at the beginning which returns an -UFAT_ERR_BLOCK_SIZE error if block size is above 4 kB? On the other hand - why the limitation about minimum block size being 512 B?

Yes, there should be a test for that.

I guess 512 bytes is just the smallest block size I've ever seen on any device.

  1. Where does the value 0x10000000 come from? https://github.com/dlbeer/ufat/blob/2d0ad69e56c44e9287adc489970b2c8fb8c6bae8/ufat_mkfs.c#L94 I think there is no limitation on the number of clusters for FAT, so the only factor would be an uint32_t type used by uFAT to store this number, but if that's the case, should there be one more zero (and one more "f" in line 99)?

There is a limit, because the FAT table itself has 32-bit entries, and some of the higher numbers have special meanings (e.g. bad cluster, free cluster, etc.).

The limit of 0x10000000 ensures that the special indices don't coincide with legitimate cluster indices. It might possibly be too conservative though. I'm happy to increase it if it's safe to do so.

Cheers, Daniel

-- Daniel Beer dlbeer@gmail.com http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B