WMXZ-EU / uSDFS

uSD File system based on ELM-CHaN generic FAT system
28 stars 13 forks source link

Corrupted reads crossing sectors #12

Open markopenguin opened 3 years ago

markopenguin commented 3 years ago

I'm working on an application that involves reading records out of a database. Each entry in the database has a fixed size record header which contains the length of the record data which is read first, then based on the length in that header field, the actual data is read. In certain cases, the data in the second read (which in the failed case always exceeds a sector size) returns corrupted data. However, if I use a simple wrapper to ensure that I only read sector sized segments from the file system like this:

osint xfs_os_fopen(xfs_file *f, const char *pathname, xfs_mode mode)
{
    sdinit();
    int res = seterrno(f_open(&f->os, pathname, mode));
#ifdef FF_BUG_WORKAROUND
    if (!res) {
        if (mode == XFS_OPEN_READ) {
            /* Grab our first block */
            f->isreadfile = 1;
            f->offset = 0;
            if (f_read(&f->os, f->buf, sizeof(f->buf), &f->bufbytes))
                f->bufbytes = 0;
        } else
            f->isreadfile = 0;
    }
#endif
    return res;
}

xfs_off_t xfs_os_fread(void *ptr, xfs_off_t size, xfs_file *f)
{
    FRESULT res = 0;
    UINT br;
#ifdef FF_BUG_WORKAROUND
    xfs_off_t totalread = 0, count = 0;
#endif
    sdinit();
#ifdef FF_BUG_WORKAROUND
    if (f->isreadfile) {
        while (size) {
            count = size;
            if (count > f->bufbytes - (f->offset & FF_MASK))
                count = f->bufbytes - (f->offset & FF_MASK);
            //xfs_verbose(NULL, "size = %d, count = %d, offset = %d, bufbytes = %d", (int)size, (int)count, (int)f->offset, (int)f->bufbytes);
            memcpy(ptr, f->buf + (f->offset & FF_MASK), count);
            f->offset += count;
            ptr += count;
            size -= count;
            totalread += count;
            if (!(f->offset & FF_MASK)) {
                /* Time to read the next sector */
                //xfs_verbose(NULL, "fresh sector, expected at %d (vs %d)", (int)f->offset, (int)f_tell(&f->os));
                if ((res = f_read(&f->os, f->buf, sizeof(f->buf), &f->bufbytes))) {
                    xfs_verbose(NULL, "read returned %d", res);    
                    f->bufbytes = 0;
                    return seterrno(res);
                } else if (!f->bufbytes) {
                    /* We reached the end of the file */
                    break;
                }
            } else if (size) {
                /* If there is more to read and we're not at a sector boundary, then we have reached the end of the file */
                break;
            }
        }
        return res ? seterrno(res) : totalread;
    } else
#endif
        res = f_read(&f->os, ptr, (UINT)size, &br);
    return res ? seterrno(res) : (xfs_off_t)br;
}

Then the application works perfectly, albeit much more slowly. I am open to any suggestions you might have!

Mark

terjeio commented 3 years ago

I have run into the same issue and it is one of the weirdest bugs I have ever encountered.

When sd_CardReadBlocks (in sdhc.c) is called to read the sector that contains the data that is corrupted it completes seemingly without error but the sector data is not loaded into the buffer. To debug this I have memset the whole buffer to a value before starting the read, and dumping debug info into it in the interrupt handler if it has not been written to by DMA. A bit nasty I know, but when I load a ~700kb file sector by sector (512 bytes) only the last sector read contains the debug data. Note that the file is not a whole number of sectors long.

Since sd_CardReadBlocks only reads complete sectors I do not get why it fails when FatFS needs a fragment of a sector as there is no way sd_CardReadBlocks, AFAICT, has any knowledge of that.

Am I missing something obvious?

WMXZ-EU commented 3 years ago

AFAIK, these issues could be part of Elm ChaN's ff.c code, which is taken without modifications. As noted, sd_CardReadBlocks reads only complete sectors and ignores the content

terjeio commented 3 years ago

Thanks for the reply.

these issues could be part of Elm ChaN's ff.c code, which is taken without modifications.

I have exactly the same version of his code (and most of mine) running perfectly fine on another processor (MSP432E401Y) so there is something else amiss. I have replaced the FatFS code from your repo with a fresh copy of the same version from his to be sure nothing was changed in yours - it didn't help. I can also be noted that my changes for debugging has been in the sdhc.c file, and that FatFS forwards the data I have put in the sector buffer to the application layer.

So, something is messing up the DMA transaction causing it to not write data to the sector buffer. The transaction is flagged as complete with no errors that I can detect from the registers. E.g. the PRM is quite clear that the DINT bit in the INT_STATUS register is not set if there is an error detected:

Occurs only when the internal DMA finishes the data transfer successfully. Whenever errors occur during data transfer, this bit will not be set. Instead, the DMAE bit will be set. Either Simple DMA or ADMA finishes data transferring, this bit will be set.

It is set when the isr fires, perhaps erroneously?

It can be mentioned that I have a lwIP stack running and the error occurs when downloading files. Maybe the low-level lwIP driver code interferes with the uSDHC DMA? But then why only when I read a fragment of data from a sector? Crazy...

Other details: The sector that fails is sequential to the previous so the file is not fragmented at this point. A file exacly N sectors long does not fail.

I'll report back if I am able to resolve this.

terjeio commented 3 years ago

It is the same issue as #11. FatFS reads complete sectors to the target buffer provided by the user program, fragmented sectors are read to a per file handle buffer and only the requested fragment is then copied to the target buffer. If the file handle is allocated on the heap (via malloc) by the user program then DMA quietly fails. I have not checked if this is a silicon bug or a feature...

If DMA cannot changed to read/write to the heap a workaround could be to have a static local buffer in sdhc.c, read sectors into this with DMA and then copying to the target buffer. I did a quick and dirty test with that and it works.