-
我在 RTX 4090 工作站上运行 banchmark 程序,取得了异常高的总线带宽数据:
``` bash
Pytorch version : 1.14.0a0+44dac51
CUDA version : 12.0
GPU : NVIDIA GeForce RTX 4090
Matrix Multiplication:
…
-
Hello Intel PCM and PQoS developers,
I am facing an issue with Intel PCM where it fails to report local and remote memory bandwidth (LMB and RMB) metrics when monitored through Prometheus, despite …
-
I'm working to improve the monitoring capabilities of Intel's Platform Quality of Service (PQoS) when used alongside Intel Performance Counter Monitor (PCM). I've encountered a limitation that seems t…
-
Although it may be out of scope, it would be nice to have an example of computing 4bit and 8bit tensors, to save memory bandwidth.
-
Recently, I find there are several proposals about system benchmarks, e.g., #1003 and #889 , which seem to have similar tests and make me confused. So I did a simple research about them.
### Overvi…
-
Create a benchmark for testing cpu - memory bandwidth and compare that with a gpu solution.
A simple start is to test neuron only functions because they should be independent so easily parallelizable…
-
**Is your feature request related to a problem? Please describe.**
One thing that needs to be checked before trying to parallelize a code is to see whether or not memory bandwidth has already been sa…
-
I use OpenCL-Benchmark-Linux to verify opencl is working in WSL. I just installed new Ubuntu 24.04 and it looks like it is slower than 22.04. Then I also tried in Debian 12 bookworm and it is slower t…
-
### 🐛 Describe the bug
Comment out all models [here](https://github.com/pytorch/pytorch/blob/e84cf805d2e815b8fecd847f5947b8cd8fbf61f3/benchmarks/gpt_fast/benchmark.py#L236) except run_llama2_7b_bf16
…
-
**Is your feature request related to a problem? Please describe.**
In the current reception logic of netkvm, due to virtio protocol headers and data packets being in two separate memory blocks. so at…