dhara_map_capacity returns inconsistent values

mirkomatontispire commented 1 year ago

Hi there,

We are encountering a mysterious bug when writing a big file ~40mb to the NAND using DHARA in combination with reliance-edge API.

Reliance edge works by writing the Sector Count in the first sector( which subsequently becomes read only) when formatting. This value must be consistent as it is checked at runtime by calling the function dhara_map_capacity(). We notice that after we write a file bigger than ~40mb the value returned by dhara_map_capacity() changes. This prevents us to mount FS again because the current sector count diverges from the one written in the first sector. It is interesting to point out that the value returned by dhara it's bigger than the previous one and after we format the NAND again we don't encounter the bug anymore (dhara_map_capacity() value remains consistent even after multiple writes )

We replicates the bug in the following way: Erase the NAND completely Format the NAND - (RelienceEdge red_format) dhara_map_capacity returns : 110313 (this value is then written in the first sector) Write file ~40mb (RelienceEdge red_write) dhara_map_capacity returns : 112073 (RelienceEdge red_mount) then fails because 110313 != 112073 Format the NAND again dhara_map_capacity returns : 112073 and after that the value doesn't change anymore.

It would be great if you any idea/suggestion on this! Thanks :) Apologies if it looks like a generic question but due to the nature of our work I can't share too many details.

mirkomatontispire commented 1 year ago

Hello there, some updates:

We manage to isolate the issue to dhara_journal_capacity() in particular

const dhara_block_t max_bad = j->bb_last > j->bb_current ?
        j->bb_last : j->bb_current;

For some reasons max_bad is 32 after the first format and then it goes to 0 after we do some writes and we experience the issue afore mentioned.

pgreenland commented 8 months ago

Just hit a very similar issue to this myself....my disk appeared to get larger after writing some data.

Following your hint above, it seems this behaviour may be normal.

When initialising a map we see a call flow dhara_map_init -> dhara_journal_init -> reset_journal

Inside there we prepare the last and current bad blocks:

/* We don't yet have a bad block estimate, so make a
 * conservative guess.
 */
j->epoch = 0;
j->bb_last = j->nand->num_blocks >> 6;
j->bb_current = 0;

Looks like it may make a guess until it knows. which makes me think as well as going up, it could go back down in the future as blocks fail. Which is slightly more worrying.

Did you find a solution?

Thanks,

Phil

mirkomatontispire commented 8 months ago

Hey Phil,

Not a solution but as a patch you could just assign a fixed sector count and relay on that. I know it's not the best, but if you know exactly your use case and do sporadic writes it's not a big issue

pgreenland commented 8 months ago

Thanks for replying. Thats exactly what I was thinking of doing! A conservative value that allows for a bunch of bad blocks in the future :-)

dlbeer / dhara

dhara_map_capacity returns inconsistent values #35