I complie Project with VEC_SIZE = 8 and LANE_NUM = 8 using AlexNet model.
I run ./run.exe conv.aocx i see wrong results like this.
PipeCNN: An OpenCL-Based FPGA Accelerator for CNNs
Platform: Intel(R) FPGA SDK for OpenCL(TM)
Totally 1 device(s) are found
Using Device 0: de1soc_sharedonly_vga : Cyclone V SoC Development Kit
Device OpenCL Version: OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 18.1
Device Max Compute Units: 1
Device Max WorkGroup Size: 2147483647
Device Max WorkItem Size: 2147483647
Device Global Memory Size: 512 MBytes
Device Local Memory Size: 16 KBytes
Device Max Clock Freq: 1000 Mhz
Loading kernel/binary from file conv.aocx
61063552 total weights read
1024 total output reference read
154587 bytes image data read from binary files
Executing Layer 1:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Launching single work-item kernel Pool
Launching kernel lrn with local size: 1, 1, 12 (global size: 27, 27, 12)
Executing Layer 2:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Launching single work-item kernel Pool
Launching kernel lrn with local size: 1, 1, 32 (global size: 13, 13, 32)
Executing Layer 3:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Executing Layer 4:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Executing Layer 5:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Launching single work-item kernel Pool
Executing Layer 6:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Executing Layer 7:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Executing Layer 8:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Copyed all batched results from fc_1 buffers.
Selected item = 0 from the combined batch results in fc buffers
Start verifying results ...
Item=0 is wrong (result=-11.000000, golden_ref=-14.000000)
Item=1 is wrong (result=14.000000, golden_ref=13.000000)
Item=2 is wrong (result=-2.000000, golden_ref=-5.000000)
Item=3 is wrong (result=-4.000000, golden_ref=-7.000000)
Item=4 is wrong (result=-8.000000, golden_ref=-9.000000)
Item=5 is wrong (result=-1.000000, golden_ref=-3.000000)
Item=6 is wrong (result=-2.000000, golden_ref=-6.000000)
Item=7 is wrong (result=1.000000, golden_ref=0.000000)
Item=8 is wrong (result=12.000000, golden_ref=11.000000)
Totally 792 Wrong Results
PipeCNN exited !!!
Performance Summary
Kernel runtime summary:
Layer-1:
MemRd: 41.939 ms
Conv : 41.770 ms
Pool : 41.533 ms
MemWr: 41.687 ms
Lrn : 1.624 ms
Layer-2:
MemRd: 33.475 ms
Conv : 33.334 ms
Pool : 33.130 ms
MemWr: 33.272 ms
Lrn : 0.539 ms
Layer-3:
MemRd: 22.566 ms
Conv : 22.431 ms
Pool : 0.000 ms
MemWr: 22.342 ms
Lrn : 0.000 ms
Layer-4:
MemRd: 16.993 ms
Conv : 16.855 ms
Pool : 0.000 ms
MemWr: 16.784 ms
Lrn : 0.000 ms
Layer-5:
MemRd: 11.456 ms
Conv : 11.299 ms
Pool : 11.095 ms
MemWr: 11.234 ms
Lrn : 0.000 ms
Layer-6:
MemRd: 14.537 ms
Conv : 14.406 ms
Pool : 0.000 ms
MemWr: 14.311 ms
Lrn : 0.000 ms
Layer-7:
MemRd: 6.627 ms
Conv : 6.498 ms
Pool : 0.000 ms
MemWr: 6.409 ms
Lrn : 0.000 ms
Layer-8:
MemRd: 1.860 ms
Conv : 1.730 ms
Pool : 0.000 ms
MemWr: 1.641 ms
Lrn : 0.000 ms
Total kernel runtime 148.323 ms
Batch size = 1, average process time per batch: 148.323 ms
Dear @doonny,
I complie Project with VEC_SIZE = 8 and LANE_NUM = 8 using AlexNet model.
I run ./run.exe conv.aocx i see wrong results like this.
PipeCNN: An OpenCL-Based FPGA Accelerator for CNNs
Platform: Intel(R) FPGA SDK for OpenCL(TM)
Totally 1 device(s) are found
Using Device 0: de1soc_sharedonly_vga : Cyclone V SoC Development Kit
Device OpenCL Version: OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 18.1 Device Max Compute Units: 1
Device Max WorkGroup Size: 2147483647
Device Max WorkItem Size: 2147483647
Device Global Memory Size: 512 MBytes
Device Local Memory Size: 16 KBytes
Device Max Clock Freq: 1000 Mhz
Loading kernel/binary from file conv.aocx
61063552 total weights read
1024 total output reference read
154587 bytes image data read from binary files
Executing Layer 1:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Launching single work-item kernel Pool
Launching kernel lrn with local size: 1, 1, 12 (global size: 27, 27, 12)
Executing Layer 2:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Launching single work-item kernel Pool
Launching kernel lrn with local size: 1, 1, 32 (global size: 13, 13, 32)
Executing Layer 3:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Executing Layer 4:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Executing Layer 5:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Launching single work-item kernel Pool
Executing Layer 6:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Executing Layer 7:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Executing Layer 8:
Launching single work-item kernel winbuffer
Launching single work-item kernel Conv
Launching single work-item kernel MemWr
Copyed all batched results from fc_1 buffers.
Selected item = 0 from the combined batch results in fc buffers
Start verifying results ...
Item=0 is wrong (result=-11.000000, golden_ref=-14.000000)
Item=1 is wrong (result=14.000000, golden_ref=13.000000)
Item=2 is wrong (result=-2.000000, golden_ref=-5.000000)
Item=3 is wrong (result=-4.000000, golden_ref=-7.000000)
Item=4 is wrong (result=-8.000000, golden_ref=-9.000000)
Item=5 is wrong (result=-1.000000, golden_ref=-3.000000)
Item=6 is wrong (result=-2.000000, golden_ref=-6.000000)
Item=7 is wrong (result=1.000000, golden_ref=0.000000)
Item=8 is wrong (result=12.000000, golden_ref=11.000000)
Totally 792 Wrong Results
PipeCNN exited !!!
Performance Summary
Kernel runtime summary:
Layer-1:
MemRd: 41.939 ms
Conv : 41.770 ms
Pool : 41.533 ms
MemWr: 41.687 ms
Lrn : 1.624 ms
Layer-2:
MemRd: 33.475 ms
Conv : 33.334 ms
Pool : 33.130 ms
MemWr: 33.272 ms
Lrn : 0.539 ms
Layer-3:
MemRd: 22.566 ms
Conv : 22.431 ms
Pool : 0.000 ms
MemWr: 22.342 ms
Lrn : 0.000 ms
Layer-4:
MemRd: 16.993 ms
Conv : 16.855 ms
Pool : 0.000 ms
MemWr: 16.784 ms
Lrn : 0.000 ms
Layer-5:
MemRd: 11.456 ms
Conv : 11.299 ms
Pool : 11.095 ms
MemWr: 11.234 ms
Lrn : 0.000 ms
Layer-6:
MemRd: 14.537 ms
Conv : 14.406 ms
Pool : 0.000 ms
MemWr: 14.311 ms
Lrn : 0.000 ms
Layer-7:
MemRd: 6.627 ms
Conv : 6.498 ms
Pool : 0.000 ms
MemWr: 6.409 ms
Lrn : 0.000 ms
Layer-8:
MemRd: 1.860 ms
Conv : 1.730 ms
Pool : 0.000 ms
MemWr: 1.641 ms
Lrn : 0.000 ms
Total kernel runtime 148.323 ms
Batch size = 1, average process time per batch: 148.323 ms
Total runtime: 0.154240s
Can you help me fix it?
Tkank you very much!