-
get triton gemm perf 80% of oneDNN/XeTLA utilizing genISA/vc-intrinsics.
the lowering pipeline would be "triton -> tritongpu -> optimized/simplified tritongpu => llvm/spirv".
this serves as an um…
-
Now that Apple [deprecated OpenCL support in `osx` 10.12]() and [NVIDIA will no longer provide CUDA support for `osx` after CUDA 10.2](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.htm…
-
I'm assuming this can't be optimised to run on GPU (unless Postgres has some extension)
How long does it take comparatively. Is it 100x slower?
-
Hello,
I want to express my appreciation for your impressive work on SiMT community.
I am interested in obtaining the adapter model for NMT LLM (Falcon or Llama2) and would like to kindly request …
-
In the cases to lower the `tt.dot` to dpas with the fp16 D type, the results of the DPAS is not correct.
The DPAS op in the MLIR with GenX dialect:
`%23884 = genx.matrix.dpas %23613, %23165, %2336…
-
Good evening, all. I am attempting to compile a minimal [CUTLASS](https://github.com/NVIDIA/cutlass/releases/tag/v3.4.1) GEMM example in a PyTorch project. I want to write a simple CUTLASS kernel and …
-
The following gives a warning "Invalid status value, converted to NA"
```
simt
-
I have implemented a basic sample code to convolve a 2D image with a row filter.
It works, but when the dst image has some stride, it seems ignored by CUTLASS and all the extra elements of the first …
-
在执行嵌套分支的时候,内层分支在汇合时,join指令会在rpc和当前pc不一致的情况下导致simt_stack出栈。
![80ac4863b87949f874255b6846c182e](https://github.com/THU-DSP-LAB/ventus-gpgpu-isa-simulator/assets/37099022/f0821c77-ba88-4bdb-930e-4b3…
-
i encounter errors when build accel-sim simulator following readme, and no previous issue related to this problem
i ran
```bash
pip3 install -r requirements.txt
source ./gpu-simulator/setup_enviro…