cambridgehackers / connectal

Connectal is a framework for software-driven hardware development.
MIT License
161 stars 46 forks source link

Issue with portalCacheFlush in memcpy example? #120

Open psrawke opened 8 years ago

psrawke commented 8 years ago

Hi Jamey,

In case of memcpy, the program finishes with status 0 (successfully) unlike previously with older images. But when I print the src and dst array in software's done function, some of the locations are overwritten (or initialized as 0). If I initialize, the memory after 'portalCacheFlush' src and dst are initialized properly. Can that be happening because of 'portalCacheFlush'?

Default order for memory initialization in memcpy example: In this case, some parts of the src and dst array are not initialzed properly.

  srcAlloc = portalAlloc(alloc_sz, 0);
  dstAlloc = portalAlloc(alloc_sz, 0);

  srcBuffer = (unsigned int *)portalMmap(srcAlloc, alloc_sz);
  dstBuffer = (unsigned int *)portalMmap(dstAlloc, alloc_sz);

  for (int i = 0; i < numWords; i++){
     srcBuffer[i] = i;
     dstBuffer[i] = 0x5a5abeef;
  }
  portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1);
  portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);

  unsigned int ref_srcAlloc = dma->reference(srcAlloc);
  unsigned int ref_dstAlloc = dma->reference(dstAlloc);

Following modifications in the order solves the issue:

  srcAlloc = portalAlloc(alloc_sz, 0);
  dstAlloc = portalAlloc(alloc_sz, 0);

  srcBuffer = (unsigned int *)portalMmap(srcAlloc, alloc_sz);
  dstBuffer = (unsigned int *)portalMmap(dstAlloc, alloc_sz);

  portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1);
  portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);

  unsigned int ref_srcAlloc = dma->reference(srcAlloc);
  unsigned int ref_dstAlloc = dma->reference(dstAlloc);

  for (int i = 0; i < numWords; i++){
     srcBuffer[i] = i;
     dstBuffer[i] = 0x5a5abeef;
  }
jameyhicks commented 8 years ago

What values are being overwritten to which memory locations?

On Mon, Jun 6, 2016 at 9:26 AM Prasanna Rawke notifications@github.com wrote:

Hi Jamey,

In case of memcpy, the program finishes with status 0 (successfully) unlike previously with older images. But when I print the src and dst array in software's done function, some of the locations are overwritten (or initialized as 0). If I initialize, the memory after 'portalCacheFlush' src and dst are initialized properly. Can that be happening because of ' portalCacheFlush'?

Default order for memory initialization in memcpy example: In this case, some parts of the src and dst array are not initialzed properly.

srcAlloc = portalAlloc(alloc_sz, 0); dstAlloc = portalAlloc(alloc_sz, 0);

srcBuffer = (unsigned int )portalMmap(srcAlloc, alloc_sz); dstBuffer = (unsigned int )portalMmap(dstAlloc, alloc_sz);

for (int i = 0; i < numWords; i++){ srcBuffer[i] = i; dstBuffer[i] = 0x5a5abeef; } portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1); portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);

unsigned int ref_srcAlloc = dma->reference(srcAlloc); unsigned int ref_dstAlloc = dma->reference(dstAlloc);

Following modifications in the order solves the issue:

srcAlloc = portalAlloc(alloc_sz, 0); dstAlloc = portalAlloc(alloc_sz, 0);

srcBuffer = (unsigned int )portalMmap(srcAlloc, alloc_sz); dstBuffer = (unsigned int )portalMmap(dstAlloc, alloc_sz);

portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1); portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);

unsigned int ref_srcAlloc = dma->reference(srcAlloc); unsigned int ref_dstAlloc = dma->reference(dstAlloc);

for (int i = 0; i < numWords; i++){ srcBuffer[i] = i; dstBuffer[i] = 0x5a5abeef; }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120, or mute the thread https://github.com/notifications/unsubscribe/ACU3syz_md6HY0WYwCjsKUMusp4zw9J7ks5qJCAZgaJpZM4Iu5Kz .

psrawke commented 8 years ago

The values being overwritten vary with each run of the program. But they were in blocks of 8 or 16, replaced by zeroes. The first eight addresses of srcBuffer were always replaced while the other block replaced was around indices 80-100 and varies with each run.

jameyhicks commented 8 years ago

Please post the code you added to print out the values and then a transcript showing the incorrect values.

On Mon, Jun 6, 2016 at 10:52 AM Prasanna Rawke notifications@github.com wrote:

The values being overwritten vary with each run of the program. But they were in blocks of 8 or 16, replaced by zeroes. The first eight addresses of srcBuffer were always replaced while the other block replaced was around indices 80-100 and varies with each run.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120#issuecomment-223982977, or mute the thread https://github.com/notifications/unsubscribe/ACU3sy-OHlCW6zYPpa8-upfnMBj-yEnrks5qJDQwgaJpZM4Iu5Kz .

psrawke commented 8 years ago

I have modified done method of MemcpyIndication class as:

virtual void done() {
    sem_post(&done_sem);
    fprintf(stderr, "done\n");
    finished = true;
    memcmp_fail = memcmp(srcBuffer, dstBuffer, numWords*sizeof(unsigned int));
    if (1) {
        int *s = (int *)srcBuffer;
        int *d = (int *)dstBuffer;
        for (int i = 0; i < numWords; i++) {

            if (s[i] != i || d[i] != i)
                fprintf(stderr, "mismatch %d %d %d\n", i, s[i], d[i]);
            // else fprintf(stderr, "   match %d %d %d\n", i, s[i], d[i]);
         }
    }
    fprintf(stderr, "memcmp=%x\n", memcmp_fail);
    sem_post(&memcmp_sem);
  }

Also, I have reduced the problem size to 256 as follows:

#ifndef SIMULATION
int numWords = 1 << 8;
#else
int numWords = 1 << 8;
#endif

These are the transcripts for the program for 3 runs:

memcpy_log1.txt memcpy_log2.txt memcpy_log3.txt

jameyhicks commented 8 years ago

I am still unable to reproduce this problem, but I added code to verify src[i] == dst[i] == i

psrawke commented 8 years ago

When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?

jameyhicks commented 8 years ago

Prasanna,

In the case of simulation, you are using x86 user-mode cache coherence. In the case of Zynq, DMA is using uncached reads and writes and so we have to use portalCacheFlush() to manually synchronize the caches with DRAM.

I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also test it on a VC707 attached to an x86 system, but in that case the CPU snoops DRAM so the manual cache flush and invalidation are not required. After adding the code that checks that the value at offset i of src and dst both equal i, the tests are still working.

It seems like we must have a bug in the cache flush code in the kernel, and sometimes it is not flushing dirty cache lines. That would explain you you have some regions that remain zero after running the test.

-Jamey

On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke notifications@github.com wrote:

When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120#issuecomment-224194092, or mute the thread https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz .

jameyhicks commented 8 years ago

I have reproduced the issue by doing as you suggested -- shortening the length to 256 words.

On Tue, Jun 7, 2016 at 8:36 AM Jamey Hicks jamey.hicks@gmail.com wrote:

Prasanna,

In the case of simulation, you are using x86 user-mode cache coherence. In the case of Zynq, DMA is using uncached reads and writes and so we have to use portalCacheFlush() to manually synchronize the caches with DRAM.

I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also test it on a VC707 attached to an x86 system, but in that case the CPU snoops DRAM so the manual cache flush and invalidation are not required. After adding the code that checks that the value at offset i of src and dst both equal i, the tests are still working.

It seems like we must have a bug in the cache flush code in the kernel, and sometimes it is not flushing dirty cache lines. That would explain you you have some regions that remain zero after running the test.

-Jamey

On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke notifications@github.com wrote:

When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120#issuecomment-224194092, or mute the thread https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz .

jameyhicks commented 8 years ago

I cannot explain this, but this version passes repeatedly: if (1) for (int i = 0; i < numWords; i++){ srcBuffer[i] = 0xbabafeed; dstBuffer[i] = 0x5a5abeef; }

if (1) { portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1); portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1); }

for (int i = 0; i < numWords; i++){ srcBuffer[i] = i; dstBuffer[i] = 0x5a5abeef; }

if (0) { portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1); portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1); fprintf(stderr, "Main::flush and invalidate complete\n"); }

On Tue, Jun 7, 2016 at 9:06 AM Jamey Hicks jamey.hicks@gmail.com wrote:

I have reproduced the issue by doing as you suggested -- shortening the length to 256 words.

On Tue, Jun 7, 2016 at 8:36 AM Jamey Hicks jamey.hicks@gmail.com wrote:

Prasanna,

In the case of simulation, you are using x86 user-mode cache coherence. In the case of Zynq, DMA is using uncached reads and writes and so we have to use portalCacheFlush() to manually synchronize the caches with DRAM.

I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also test it on a VC707 attached to an x86 system, but in that case the CPU snoops DRAM so the manual cache flush and invalidation are not required. After adding the code that checks that the value at offset i of src and dst both equal i, the tests are still working.

It seems like we must have a bug in the cache flush code in the kernel, and sometimes it is not flushing dirty cache lines. That would explain you you have some regions that remain zero after running the test.

-Jamey

On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke notifications@github.com wrote:

When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120#issuecomment-224194092, or mute the thread https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz .