Open xvanQ opened 9 months ago
The output of profile bandwidth is as follows: size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/s
size: 0.25 MB, cpu-to-gpu bandwidth: 4.556 GB/s size: 32.00 MB, cpu-to-gpu bandwidth: 12.285 GB/s size: 128.00 MB, cpu-to-gpu bandwidth: 12.251 GB/s
Which is ctog_bdw, which is gtoc_bdw_cache, which is gtoc_bdw_hidden?
The output of profile matmul is as follows: device: cuda, N: 1024, latency: 0.06 ms, TFLOPS: 68.186 device: cuda, N: 2048, latency: 0.20 ms, TFLOPS: 97.026
device: cpu, N: 1024, latency: 0.89 ms, TFLOPS: 3.488 device: cpu, N: 2048, latency: 8.44 ms, TFLOPS: 2.924
which is mm_flops_p, mm_flops_g, bmm_flops_p,bmm_flops_g and cpu_flops? Thanks
Have you figured out this question, I have this question too
The output of profile bandwidth is as follows: size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/s
size: 0.25 MB, cpu-to-gpu bandwidth: 4.556 GB/s size: 32.00 MB, cpu-to-gpu bandwidth: 12.285 GB/s size: 128.00 MB, cpu-to-gpu bandwidth: 12.251 GB/s
Which is ctog_bdw, which is gtoc_bdw_cache, which is gtoc_bdw_hidden?
The output of profile matmul is as follows: device: cuda, N: 1024, latency: 0.06 ms, TFLOPS: 68.186 device: cuda, N: 2048, latency: 0.20 ms, TFLOPS: 97.026
device: cpu, N: 1024, latency: 0.89 ms, TFLOPS: 3.488 device: cpu, N: 2048, latency: 8.44 ms, TFLOPS: 2.924
which is mm_flops_p, mm_flops_g, bmm_flops_p,bmm_flops_g and cpu_flops? Thanks