dlbeer / dhara

NAND flash translation layer for low-memory systems
Other
397 stars 112 forks source link

Very aggressive garbage collection --- a configuration problem? #20

Closed davidgiven closed 3 years ago

davidgiven commented 3 years ago

I'm using Dhara mostly successfully as the FTL layer on top of a Raspberry Pi Pico (see http://cowlark.com/2021-02-16-fuzix-pi-pico). The front end is using a traditional Unix filesystem; the filesystem block size is the same as the Dhara page size of 512 bytes, and the erase size is 4096 bytes. The actual implementation is at https://github.com/davidgiven/FUZIX/blob/rpipico/Kernel/platform-rpipico/devflash.c.

I've added trim support to the filesystem, so that it notifies Dhara when blocks are no longer in use by the filesystem. My understanding is that this helps Dhara do a better job of garbage collection, and that it gets very unhappy if it doesn't have a pool of unused blocks which it can write to. However, no matter what I do, every operation seems to result in substantial numbers of copies: for example, after deleting a 31kB file I can see the filesystem trim the 62 blocks which contained the file data --- but in the process Dhara has used the copy callback 1500 times! That's over half the data in the filesystem, and of course in order to copy those blocks it's had to erase them first...

Do you have any idea what might be happening here, and how I can stop it?

dlbeer commented 3 years ago

On Mon, Mar 01, 2021 at 03:26:58PM -0800, David Given wrote:

I'm using Dhara mostly successfully as the FTL layer on top of a Raspberry Pi Pico (see http://cowlark.com/2021-02-16-fuzix-pi-pico). The front end is using a traditional Unix filesystem; the filesystem block size is the same as the Dhara page size of 512 bytes, and the erase size is 4096 bytes. The actual implementation is at https://github.com/davidgiven/FUZIX/blob/rpipico/Kernel/platform-rpipico/devflash.c.

I've added trim support to the filesystem, so that it notifies Dhara when blocks are no longer in use by the filesystem. My understanding is that this helps Dhara do a better job of garbage collection, and that it gets very unhappy if it doesn't have a pool of unused blocks which it can write to. However, no matter what I do, every operation seems to result in substantial numbers of copies: for example, after deleting a 31kB file I can see the filesystem trim the 62 blocks which contained the file data --- but in the process Dhara has used the copy callback 1500 times! That's over half the data in the filesystem, and of course in order to copy those blocks it's had to erase them first...

Do you have any idea what might be happening here, and how I can stop it?

Hi David,

That definitely doesn't sound like normal behaviour.

Are you sure about the erase size? I've never heard of a NAND flash chip with such a small eraseblock size. If it were misconfigured, it could cause a lot of spurious erase/program failures, which might explain what you're seeing.

Cheers, Daniel

-- Daniel Beer dlbeer@gmail.com http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B

davidgiven commented 3 years ago

Good thought --- unfortunately I double checked and it seems that's correct: https://github.com/raspberrypi/pico-sdk/blob/2d5789eca89658a7f0a01e2d6010c0f254605d72/src/rp2_common/hardware_flash/include/hardware/flash.h I haven't found out what the actual chip is (the label is too small to read on the board!).

The underlying chip has a page size of 256 bytes, a small erase block size of 4kB bytes, and a large erase block size of 64kB. I'm telling Dhara that the page size is 512 bytes to match the filesystem block size. Is it worth trying with a 64kB erase block size?

Also, what's a reasonable value for the gc_ratio field? Currently I just picked something at random...

davidgiven commented 3 years ago

So quite a lot of my problem was that I was incorrectly setting up the filesystem --- it turns out that I was incorrectly trimming it so every block was allocated. This meant that Dhara had no free space. Fixing that seems to have solved most of the problems.

However, I do see that trimming a block appears to perform an immediate garbage collection, resulting in an erase and several copies. I would have thought that trimming a block would result in it just being marked as free in the internal data structures so as to make garbage collection easier later, or am I misunderstanding how it works? Given that implementing trim is optional, how does the library recover free blocks without it?

By the way, using a 64kB erase block size halves the amount of available blocks in the map, which seems surprising. I think that I may be failing to correctly understand how the map's set up.

dlbeer commented 3 years ago

On Tue, Mar 02, 2021 at 02:27:21AM -0800, David Given wrote:

Also, what's a reasonable value for the gc_ratio field? Currently I just picked something at random...

A reasonable value might be 1 or 2. If you've set it quite large then you will get a lot of write amplification, which may be your problem.

The GC ratio trades off write amplification for available capacity. The capacity for rewritable blocks is the number of pages times 1/(G+1), and the write amplification factor is (G+1).

-- Daniel Beer dlbeer@gmail.com http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B

dlbeer commented 3 years ago

On Tue, Mar 02, 2021 at 12:40:41PM -0800, David Given wrote:

So quite a lot of my problem was that I was incorrectly setting up the filesystem --- it turns out that I was incorrectly trimming it so every block was allocated. This meant that Dhara had no free space. Fixing that seems to have solved most of the problems.

However, I do see that trimming a block appears to perform an immediate garbage collection, resulting in an erase and several copies. I would have thought that trimming a block would result in it just being marked as free in the internal data structures so as to make garbage collection easier later, or am I misunderstanding how it works? Given that implementing trim is optional, how does the library recover free blocks without it?

Garbage collection is done incrementally by rewriting reachable sectors to the front of the journal, at which point a sector at the back of the journal becomes garbage.

You don't actually need to trim, and for most filesystems it's not necessary. If you initialize the filesystem with a disk size of (dhara_map_capacity() - 1) sectors, you should be fine without needing to use trim, as the number of live sectors will never exceed the maximum that the map is capable of dealing with. Rewriting a sector automatically makes the old version a candidate for garbage collection.

By the way, using a 64kB erase block size halves the amount of available blocks in the map, which seems surprising. I think that I may be failing to correctly understand how the map's set up.

You probably don't want to do this if the erase block really is 4kB. Some number of erase blocks are subtracted from the capacity to allow for a failure margin.

-- Daniel Beer dlbeer@gmail.com http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B

davidgiven commented 3 years ago

Ah --- I was using a gc ratio of 128 (assuming it was a fixed point number and picking something in the middle)...

Using a much smaller number, everything works fine now. Thanks very much for the help!

BTW, if you want it, I have a simple standalone command line program for creating flashable FTL images: https://github.com/davidgiven/FUZIX/blob/mkftl/Standalone/mkftl.c