simt Search Results - Githubissues

495 results
for simt

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

iree-org/iree #10953

Check SIMT and cooperative matrix pipeline correctness in e2…

In [e2e matmul tests](https://github.com/iree-org/iree/tree/main/tests/e2e/matmul) we have correctness tests for `SPIRVBaseVectorize` pipeline. We need to also add support for `SPIRVMatmulPromoteVecto…

antiagainst updated 3 months ago
1
w3c/machine-learning-workshop #66

WebGPU fitness for ML frameworks

@jasonmayes raises the question of [whether WebGPU exposes the right API surface needed to support ML frameworks interactions with GPUs](https://www.w3.org/2020/06/machine-learning-workshop/talks/oppo…

dontcallmedom updated 4 years ago
13
intel/intel-xpu-backend-for-triton #1043

[Upstream] Upstream the fix of the common issue in Triton GP…

This issue has been fixed in the down stream. https://github.com/intel/intel-xpu-backend-for-triton/pull/835 Need to upstream the fix to the upstream Triton.

chengjunlu updated 3 months ago
1
IntelPython/DPPY-Spec #3

__partitioned__ protocol for partitioned and distributed dat…

The current state of the specification can be found here: https://github.com/IntelPython/DPPY-Spec/blob/draft/partitioned/Partitioned.md

fschlimb updated 3 years ago
52
vortexgpgpu/vortex #62

SIMT stack deadlock issue not fixed in the vortex?

SIMT deadlock example: A: *mutex = 0· B: while(!atomicCAS(mutex, 0· ,1)); C: // critical section atomicExch(mutex, 0· ) ;

mabo08 updated 4 months ago
1
intel/intel-xpu-backend-for-triton #1637

Investigate reduction performance difference between XeTLA a…

Looking into our [FlashAttention-2](https://github.com/intel/intel-xpu-backend-for-triton/blob/perf_attn/python/tutorials/06-fused-attention.forward.py) benchmark, we see that reduction codegen for Xe…

victor-eds updated 2 months ago
16
yuenshome/yuenshome.github.io #45

内核融合：深度学习的“加速神器”

- [内核融合：深度学习的“加速神器](https://www.doit.com.cn/p/306236.html) - [TVM](https://github.com/dmlc/tvm) - [NNVM-fusion]() - [XLA Beginner](https://github.com/TensorflowXLABeginner/XLA-Report) - [深度学习所有硬…

ysh329 updated 5 years ago
3
twesterhout/lattice-symmetries #1

generic/portable support for non-x86 systems

I am reviewing https://github.com/openjournals/joss-reviews/issues/3537 and was unable to test on my Apple M1 (my machine is not setup to compiler for x86 even if Rosetta2 supports executing such bina…

jeffhammond updated 3 years ago
8
gpuweb/gpuweb #78

Investigation: Querying Subgroup Support

Better performance with divergent kernels and more localized data sharing is a strength of modern SIMT hardware, making massively parallel algorithm considerations viable for more fields than before. …

mehmetoguzderin updated 2 days ago
1
intel/intel-xpu-backend-for-triton #1154

[#8 GEMM Performance] compare triton optimizations to XeTLA/…

Shape(m*k*n) 4096\*4096\*4096 8192\*8192\*8192 1024\*28672\*8192 3072\*4096\*3072

Dewei-Wang-sh updated 4 months ago
1

上一页 1...13 14 15 16 17 18 19...50 下一页

495 results for simt

495 results
for simt