Open psrawke opened 8 years ago
What values are being overwritten to which memory locations?
On Mon, Jun 6, 2016 at 9:26 AM Prasanna Rawke notifications@github.com wrote:
Hi Jamey,
In case of memcpy, the program finishes with status 0 (successfully) unlike previously with older images. But when I print the src and dst array in software's done function, some of the locations are overwritten (or initialized as 0). If I initialize, the memory after 'portalCacheFlush' src and dst are initialized properly. Can that be happening because of ' portalCacheFlush'?
Default order for memory initialization in memcpy example: In this case, some parts of the src and dst array are not initialzed properly.
srcAlloc = portalAlloc(alloc_sz, 0); dstAlloc = portalAlloc(alloc_sz, 0);
srcBuffer = (unsigned int )portalMmap(srcAlloc, alloc_sz); dstBuffer = (unsigned int )portalMmap(dstAlloc, alloc_sz);
for (int i = 0; i < numWords; i++){ srcBuffer[i] = i; dstBuffer[i] = 0x5a5abeef; } portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1); portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);
unsigned int ref_srcAlloc = dma->reference(srcAlloc); unsigned int ref_dstAlloc = dma->reference(dstAlloc);
Following modifications in the order solves the issue:
srcAlloc = portalAlloc(alloc_sz, 0); dstAlloc = portalAlloc(alloc_sz, 0);
srcBuffer = (unsigned int )portalMmap(srcAlloc, alloc_sz); dstBuffer = (unsigned int )portalMmap(dstAlloc, alloc_sz);
portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1); portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1);
unsigned int ref_srcAlloc = dma->reference(srcAlloc); unsigned int ref_dstAlloc = dma->reference(dstAlloc);
for (int i = 0; i < numWords; i++){ srcBuffer[i] = i; dstBuffer[i] = 0x5a5abeef; }
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120, or mute the thread https://github.com/notifications/unsubscribe/ACU3syz_md6HY0WYwCjsKUMusp4zw9J7ks5qJCAZgaJpZM4Iu5Kz .
The values being overwritten vary with each run of the program. But they were in blocks of 8 or 16, replaced by zeroes. The first eight addresses of srcBuffer were always replaced while the other block replaced was around indices 80-100 and varies with each run.
Please post the code you added to print out the values and then a transcript showing the incorrect values.
On Mon, Jun 6, 2016 at 10:52 AM Prasanna Rawke notifications@github.com wrote:
The values being overwritten vary with each run of the program. But they were in blocks of 8 or 16, replaced by zeroes. The first eight addresses of srcBuffer were always replaced while the other block replaced was around indices 80-100 and varies with each run.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120#issuecomment-223982977, or mute the thread https://github.com/notifications/unsubscribe/ACU3sy-OHlCW6zYPpa8-upfnMBj-yEnrks5qJDQwgaJpZM4Iu5Kz .
I have modified done method of MemcpyIndication class as:
virtual void done() {
sem_post(&done_sem);
fprintf(stderr, "done\n");
finished = true;
memcmp_fail = memcmp(srcBuffer, dstBuffer, numWords*sizeof(unsigned int));
if (1) {
int *s = (int *)srcBuffer;
int *d = (int *)dstBuffer;
for (int i = 0; i < numWords; i++) {
if (s[i] != i || d[i] != i)
fprintf(stderr, "mismatch %d %d %d\n", i, s[i], d[i]);
// else fprintf(stderr, " match %d %d %d\n", i, s[i], d[i]);
}
}
fprintf(stderr, "memcmp=%x\n", memcmp_fail);
sem_post(&memcmp_sem);
}
Also, I have reduced the problem size to 256 as follows:
#ifndef SIMULATION
int numWords = 1 << 8;
#else
int numWords = 1 << 8;
#endif
These are the transcripts for the program for 3 runs:
I am still unable to reproduce this problem, but I added code to verify src[i] == dst[i] == i
When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?
Prasanna,
In the case of simulation, you are using x86 user-mode cache coherence. In the case of Zynq, DMA is using uncached reads and writes and so we have to use portalCacheFlush() to manually synchronize the caches with DRAM.
I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also test it on a VC707 attached to an x86 system, but in that case the CPU snoops DRAM so the manual cache flush and invalidation are not required. After adding the code that checks that the value at offset i of src and dst both equal i, the tests are still working.
It seems like we must have a bug in the cache flush code in the kernel, and sometimes it is not flushing dirty cache lines. That would explain you you have some regions that remain zero after running the test.
-Jamey
On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke notifications@github.com wrote:
When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120#issuecomment-224194092, or mute the thread https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz .
I have reproduced the issue by doing as you suggested -- shortening the length to 256 words.
On Tue, Jun 7, 2016 at 8:36 AM Jamey Hicks jamey.hicks@gmail.com wrote:
Prasanna,
In the case of simulation, you are using x86 user-mode cache coherence. In the case of Zynq, DMA is using uncached reads and writes and so we have to use portalCacheFlush() to manually synchronize the caches with DRAM.
I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also test it on a VC707 attached to an x86 system, but in that case the CPU snoops DRAM so the manual cache flush and invalidation are not required. After adding the code that checks that the value at offset i of src and dst both equal i, the tests are still working.
It seems like we must have a bug in the cache flush code in the kernel, and sometimes it is not flushing dirty cache lines. That would explain you you have some regions that remain zero after running the test.
-Jamey
On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke notifications@github.com wrote:
When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120#issuecomment-224194092, or mute the thread https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz .
I cannot explain this, but this version passes repeatedly: if (1) for (int i = 0; i < numWords; i++){ srcBuffer[i] = 0xbabafeed; dstBuffer[i] = 0x5a5abeef; }
if (1) { portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1); portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1); }
for (int i = 0; i < numWords; i++){ srcBuffer[i] = i; dstBuffer[i] = 0x5a5abeef; }
if (0) { portalCacheFlush(srcAlloc, srcBuffer, alloc_sz, 1); portalCacheFlush(dstAlloc, dstBuffer, alloc_sz, 1); fprintf(stderr, "Main::flush and invalidate complete\n"); }
On Tue, Jun 7, 2016 at 9:06 AM Jamey Hicks jamey.hicks@gmail.com wrote:
I have reproduced the issue by doing as you suggested -- shortening the length to 256 words.
On Tue, Jun 7, 2016 at 8:36 AM Jamey Hicks jamey.hicks@gmail.com wrote:
Prasanna,
In the case of simulation, you are using x86 user-mode cache coherence. In the case of Zynq, DMA is using uncached reads and writes and so we have to use portalCacheFlush() to manually synchronize the caches with DRAM.
I've been testing this on a Zedboard and on a Zynq Mini ITX board. I also test it on a VC707 attached to an x86 system, but in that case the CPU snoops DRAM so the manual cache flush and invalidation are not required. After adding the code that checks that the value at offset i of src and dst both equal i, the tests are still working.
It seems like we must have a bug in the cache flush code in the kernel, and sometimes it is not flushing dirty cache lines. That would explain you you have some regions that remain zero after running the test.
-Jamey
On Tue, Jun 7, 2016 at 2:43 AM Prasanna Rawke notifications@github.com wrote:
When uploaded to Zedboard, the mismatch occurs. But in case of simulation, output is correct. So it might be a board specific issue. Which board are you working on?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/cambridgehackers/connectal/issues/120#issuecomment-224194092, or mute the thread https://github.com/notifications/unsubscribe/ACU3swOvYR-9LU2OI0Ep4byOV2OW8OZpks5qJRMfgaJpZM4Iu5Kz .
Hi Jamey,
In case of memcpy, the program finishes with status 0 (successfully) unlike previously with older images. But when I print the src and dst array in software's done function, some of the locations are overwritten (or initialized as 0). If I initialize, the memory after 'portalCacheFlush' src and dst are initialized properly. Can that be happening because of 'portalCacheFlush'?
Default order for memory initialization in memcpy example: In this case, some parts of the src and dst array are not initialzed properly.
Following modifications in the order solves the issue: