Closed gedworker closed 2 years ago
On Tue, Jun 22, 2021 at 06:28:35AM -0700, gedworker wrote:
I have faced with a problem which i can't solve for more than two weeks. I am using dhara with FatFS file system and a nand flash. I wrote a simple test which opens a file writes 64 bytes of data and closes it. The test spins for about 1000-3000 times. The flash is large enough to hold all written data. Everything works fine except that sometimes the fat fs function open file (f_open) crashes. After that I can't mount the file system with f_mount function. I have started digging into dhara and found more or less what is going on. The f_mount function fails because it can't find the boot sector for the fat file system. It is the zero sector which has 0xAA55 signature. I added several logging functions into my test to track what happens with zero sector. I have added an output log to dhara_nand_copy function.Also into dhara_nand_prog function to track all write to pages. Also i check the validity of the zero sector reading it content through dhara_map_find function. So my test sequence looks like the following.
- format flash
- open file
- write 64bytes
- close file
- read zero sector and check that it has 0xAA55 signature
- goto 2
So i see that on f_close function dhara starts copying pages. And sometimes i see that boot sector is copied also. After that i see that boot sector is mapped to a new location. Everything looks fine but when error happens the boot sector is mapped to a new location but I don't see that it was copied. And after that my boot sector check function fails and file system is crashed. But if i read the sector from using old location it is there. So dhara remaps the boot sector but does not copy it to that location. How that could happen? Maybe my configuration is incorrect. What should I check?
Here is a log example:
---<info> File closed ---<info> BOOT SECTOR is mapped to page 31334 ---<info> file opened ---<info>write 64 bytes ---<info>close file ---<info> -copy from ---- to ---- (A LOT OF OTHER PAGES ARE COPYED) ---<info> -copy from ---- to ---- ---<info> -copy from ---- to ---- ---<info> -copy from 31334 to 31484 (HERE BOOT SECTOR IS COPYED) ---<info> -copy from ---- to ---- ---<info> File closed ---<info> BOOT SECTOR is mapped to page 31484 (HERE BOOT SECTOR IS MAPPED TO A NEW LOCATION) ---<info> file opened ---<info>write 64 bytes ---<info>close file ---<info> -copy from ---- to ---- (A LOT OF OTHER PAGES ARE COPYED) ---<info> -copy from ---- to ---- ---<info> -copy from ---- to ---- ---<info> -copy from ---- to ---- (BOOTSECTOR is not copyed here) ---<info> -copy from ---- to ---- ---<info> -copy from ---- to ---- ---<info> File closed ---<info> BOOT SECTOR is mapped to page 31498 (HERE BOOT SECTOR IS MAPPED TO A NEW LOCATION) ---<info> BOOT SECTOR is invalid
As you see from the last log the boot sector is at the new location but was not copied. Although if i read the previous sector (31484) the boot sector will be there. Why dhara maps a page but does not copy it? Thanks,
Copying is not the only way that a sector can be remapped -- it could occur with it being rewritten, and in that case you wouldn't see a call to dhara_nand_copy.
The next thing I would try would be to log calls to dhara_nand_prog as well and see if anything turns up for the final location.
-- Daniel Beer @.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B
Hello Daniel,
Thanks for reply. I forgot to mention that I have added a log to nand prog function to track writes to pages. Here is the log before error happen.
[00:34:21.724,609]
[00:34:22.044,921]
[00:34:22.054,687]
Logs dropped (1)
[00:34:22.375,976]
So the boot sector was mapped to 41394. After file close it became mapped to page 41353. But there was no any writes and copy to that page. And obviously the boot sector is not valid because it was not rewritten and or copyied. What else should i check? Thanks,
On Wed, Jun 23, 2021 at 03:43:09AM -0700, gedworker wrote:
[00:34:22.041,992]
MEM_TST: DHARA:journal root is: 41566 [00:34:22.042,968] MEM_TST: BOOTSECTOR IS MAPPED TO PAGE 41394 ... [00:34:22.373,046] MEM_TST: DHARA:journal root is: 41582 Logs dropped (1) [00:34:22.374,023] MEM_TST: BOOTSECTOR IS MAPPED TO PAGE 41353
This looks like a problem -- during a period in which the root changes from 41566 to 41582, and the journal doesn't wrap, there is no way that the root can change to a page which is not something in that range.
Next steps for debugging might be:
Have you verified your NAND driver and configuration? Use a PRNG seeded with a unique number for each page to fill every page on the chip, and then read back and check that all the data is there (you're looking not just for read/write failures, but also addressing errors that land multiple pages in the same place).
If you're quite sure that the NAND driver is ok, I would then start to try to reproduce the scenario on a PC using a simulated NAND driver. If you've ruled out the NAND driver as a problem and it's a bug in Dhara, this will make it easy to fix.
Cheers, Daniel
-- Daniel Beer @.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B
On Thu, Jun 24, 2021 at 06:48:17AM +1200, Daniel Beer wrote:
This looks like a problem -- during a period in which the root changes from 41566 to 41582, and the journal doesn't wrap, there is no way that the root can change to a page which is not something in that range.
Sorry, I meant to say "there is no way that the boot sector can change...". Any change to any mapping, if the journal hasn't wrapped, must point to a page in the range between the old and new roots.
-- Daniel Beer @.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B
Hello, Thanks for help. I will try to check on a simulaor. I have noticed a strange thing. When i create several files so that almost 80% of the space is used i delete one file. So there is more that 50% free space available. After that i create a new file and write to it. But writing now takes much longer than it was after format. Is that relates to dhara gc or something else?
On Fri, Jun 25, 2021 at 07:53:15AM -0700, gedworker wrote:
I have noticed a strange thing. When i create several files so that almost 80% of the space is used i delete one file. So there is more that 50% free space available. After that i create a new file and write to it. But writing now takes much longer than it was after format. Is that relates to dhara gc or something else?
When you format, do you also reinitialize the Dhara map? If so, that may explain it. Dhara can avoid doing GC if there are sufficiently many unallocated sectors.
What do you have set for GC ratio? Is there any opportunity to invoke trim() on sectors that are known to be unused in your filesystem (e.g. the sectors occupied by a deleted file)?
-- Daniel Beer @.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B
Hello, I have tested today with simulator on Visual studio. My test sequence looks the following
The file creation after delete takes 34 seconds. I initialize Dhara map on format. The garbage ratio is set to 4. The size of the flash is 385 MB. So that happens because GC is working? Because when 3 files are created they occupy 300MB. And there 85MB left. I have checked that when i create only two files and delete one and create another. It is created fast enough. I will check whether trim can be used.
On Sun, Jun 27, 2021 at 12:53:15PM -0700, gedworker wrote:
I have tested today with simulator on Visual studio. My test sequence looks the following
- format_flahs with DHARA initialization
- create_file_100MB --------------------takes 610 MS
- create_file_100MB --------------------takes 609 MS
- create_file_100MB --------------------takes 609 MS
- delete file 1 file using f_unlink
- create_file_100MB --------------------takes 34313 MS
The file creation after delete takes 34 seconds. I initialize Dhara map on format. The garbage ratio is set to 4. The size of the flash is 385 MB. So that happens because GC is working? Because when 3 files are created they occupy 300MB. And there 85MB left. I have checked that when i create only two files and delete one and create another. It is created fast enough. I will check whether trim can be used.
Yes, that is likely a consequence of GC. You could also set GC_RATIO to something much lower, but then you will have a corresponding drop in usable space.
A read cache will also likely increase performance significantly (many partial reads are performed during both writes and GC).
Cheers, Daniel
-- Daniel Beer @.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B
Hello,
You could also set GC_RATIO to something much lower
I see that when i set larger value for GC ratio (It is the last parameter in dhara_map_init function) the file is created much faster after delete. Why do you recommend to set lower ratio?
but then you will have a corresponding drop in usable space
Does that mean that on initialization more space will be reserved for dhara structures or more space will be consumed durint writing and deletion.
There is not any good detailed description on dhara internal mechanism so i am a bit confused how everything works under the hood.
On Mon, Jun 28, 2021 at 01:21:09PM -0700, gedworker wrote:
I see that when i set larger value for GC ratio (It is the last parameter in dhara_map_init function) the file is created much faster after delete. Why do you recommend to set lower ratio?
Then that's a bit strange, and I'm not sure what's going on there. Did the file creation actually succeed (remember that you will have less available space)?
The GC ratio determines the amount of GC work done per write when the map size is near capacity.
Does that mean that on initialization more space will be reserved for dhara structures or more space will be consumed durint writing and deletion.
It means that more of the raw flash is allowed to be garbage at any given time. This, combined with a lower number of usable sectors, means that less work is required to keep up.
There is not any good detailed description on dhara internal mechanism so i am a bit confused how everything works under the hood.
Have you read map_internals.txt in the repository?
Cheers, Daniel
-- Daniel Beer @.***> http://dlbeer.co.nz/ PGP: BA6E 0B26 1F89 246C E3F3 C910 1E58 C43A 160A 553B
@gedworker did you ever figure out what was happening? I was struggling with what seems to be the exact same problem this week. I create a FAT32 partition, write a file close the file, unmount and remount. It works sometimes but eventually it fails to read the boot sector. If you found a solution that would be wonderful.
I added a bunch of debugging including verifying all writes/ and copies by reading back and from what I can tell my NAND driver is working fine.
Thanks,
-Brett
Hello bbrother , That issue with invalid boot sector solved after i have rewritten the functions for dhara (copy,erase,mark bad,check bad). I can't say exactly what was the reason for that. I suggest for you to check your spi interface, check all functions which require dhara. For instance check if copy bock function works fine. If you are still getting invalid boot sector and really see that it is remapped without a copy let me know. I will check my old project. You can also use a simulator. I have set up it with visual studio. Works fine and helps to debug easily.
Hello,
I have faced with a problem which i can't solve for more than two weeks. I am using dhara with FatFS file system and a nand flash. I wrote a simple test which opens a file writes 64 bytes of data and closes it. The test spins for about 1000-3000 times. The flash is large enough to hold all written data. Everything works fine except that sometimes the fat fs function open file (f_open) crashes. After that I can't mount the file system with f_mount function. I have started digging into dhara and found more or less what is going on. The f_mount function fails because it can't find the boot sector for the fat file system. It is the zero sector which has 0xAA55 signature. I added several logging functions into my test to track what happens with zero sector. I have added an output log to dhara_nand_copy function.Also into dhara_nand_prog function to track all write to pages. Also i check the validity of the zero sector reading it content through dhara_map_find function. So my test sequence looks like the following.
So i see that on f_close function dhara starts copying pages. And sometimes i see that boot sector is copied also. After that i see that boot sector is mapped to a new location. Everything looks fine but when error happens the boot sector is mapped to a new location but I don't see that it was copied. And after that my boot sector check function fails and file system is crashed. But if i read the sector from using old location it is there. So dhara remaps the boot sector but does not copy it to that location. How that could happen? Maybe my configuration is incorrect. What should I check?
Here is a log example:
--- File closed
--- BOOT SECTOR is mapped to page 31334
--- file opened
---write 64 bytes
---close file
--- -copy from ---- to ---- (A LOT OF OTHER PAGES ARE COPYED)
--- -copy from ---- to ----
--- -copy from ---- to ----
--- -copy from 31334 to 31484 (HERE BOOT SECTOR IS COPYED)
--- -copy from ---- to ----
--- File closed
--- BOOT SECTOR is mapped to page 31484 (HERE BOOT SECTOR IS MAPPED TO A NEW LOCATION)
--- file opened
---write 64 bytes
---close file
--- -copy from ---- to ---- (A LOT OF OTHER PAGES ARE COPYED)
--- -copy from ---- to ----
--- -copy from ---- to ----
--- -copy from ---- to ---- (BOOTSECTOR is not copyed here)
--- -copy from ---- to ----
--- -copy from ---- to ----
--- File closed
--- BOOT SECTOR is mapped to page 31498 (HERE BOOT SECTOR IS MAPPED TO A NEW LOCATION)
--- BOOT SECTOR is invalid
As you see from the last log the boot sector is at the new location but was not copied. Although if i read the previous sector (31484) the boot sector will be there. Why dhara maps a page but does not copy it? Thanks,