-
我在 RTX 4090 工作站上运行 banchmark 程序,取得了异常高的总线带宽数据:
``` bash
Pytorch version : 1.14.0a0+44dac51
CUDA version : 12.0
GPU : NVIDIA GeForce RTX 4090
Matrix Multiplication:
…
-
Hello, I'm beginner of this simulator. So I try to adjust the parameter of config file.
While adjusting it, I tried to reduce the main memory size of QV100 config file, but I was not able to find a…
-
Create a benchmark for testing cpu - memory bandwidth and compare that with a gpu solution.
A simple start is to test neuron only functions because they should be independent so easily parallelizable…
-
**Is your feature request related to a problem? Please describe.**
One thing that needs to be checked before trying to parallelize a code is to see whether or not memory bandwidth has already been sa…
-
### Feature Description
## GPUDirect Storage
[Nvidia GPUDirect Storage](https://developer.nvidia.com/blog/gpudirect-storage/) is a technology introduced by Nvidia that enables GPUs to read files d…
-
At the moment all the examples using `DelayLine` use DPRAM as a backing store, which quickly exhausts the FPGA resources (this is e.g. 40% of the DPRAM in the `polysyn` bitstream).
Likely we want s…
-
Some formats, like CSV easily allow to stream the result. Instead of thus generating the response (in memory), and then make the thing available for download, an idea might be to process the records w…
-
Hi,
I am trying to run the benchmark on an Apple M3 Max Laptop.
The benchmark runs but it thinks there is no FP64 available.
Is there a way to override this and force the benchmark to compile…
-
Hi,
I have a question about the performance between [host_memory_bandwidth](https://github.com/Xilinx/Vitis_Accel_Examples/tree/master/performance/host_memory_bandwidth) and [host_memory_bandwidth…
-
Is there a workaround to train the data on a 2GB GPU?