simt Search Results - Githubissues

495 results
for simt

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cutlass #941

[BUG] A10 Cutlass SGEMM result 12470 GFLOP/s lower than A…

CUDA Version 11.2 Cutlasa Version 2.9 GPU A10 cmake .. -DCUTLASS_NVCC_ARCHS=86 -DCUTLASS_LIBRARY_KERNELS=cutlass_simt_sgemm_128x128_8x2_nn_align1 make cutlass_profiler -j16 ./tools/profile…

zcuuu updated 1 year ago
7
PJLab-ADG/3DTrans #13

Error encountered when trying to train Voxel-RCNN using Uni…

I tried to train Voxel-RCNN using Uni-3D as instructed in readme files, but encountered the following error: ``` Exception|implicit_gemm]feat=torch.Size([531550, 32]),w=torch.Size([32, 3, 3, 3, 32…

Bomsw updated 1 year ago
7
NVIDIA/cutlass #965

[QST] Dose DefaultEpilogueSimt support scatterD ?

I'm implementing a small-k gather-gemm-scatter fusion kernel using simt. All things go right in gather and accumulation stage. However, incorrect result returns after scattering (I use native `OutputT…

vectorda updated 1 year ago
4
NVIDIA/cutlass #733

[FEA] Conv3d for SIMT

**Is your feature request related to a problem? Please describe.** Currently, conv3d kernels like those in [default_conv3d_fprop](https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/ke…

ernestkchan updated 1 year ago
9
traveller59/spconv #617

TensorRT: encountered cuda error 700 during inference

Has anyone encountered this porblem when using TensorRT to do inference? [07/04/2023-10:04:52] [I] Starting inference [07/04/2023-10:04:53] [E] [TRT] /home/vision/Desktop/sparseconv_trt_new_libs…

HoiM updated 1 year ago
5
SHI-Labs/NATTEN #37

Support for bf16 fp16

~~Hi, I cannot find any information about the support for bf16 and fp16. Does the current library support any of them?~~ I found the information in the catalog. Then when is bf16 going to be suppor…

jimmie33 updated 1 year ago
8
iree-org/iree #10894

Convert SPIRVMatmulPromoteVectorize to perform vectorization…

The CUDA matmul SIMT pipeline uses `scf.foreach_thread` and performs vectorization before bufferization. This is the direction we should follow for SPIR-V side too. So creating this issue to track it.…

antiagainst updated 1 year ago
1
taichi-dev/taichi #8394

[v1.7.0]FAILED: taichi/rhi/CMakeFiles/ti_device_api.dir/devi…

``` win11 wsl2 cuda-11.8 2023-10-28 install from source code , Taichi-1.7.0 (cuda118) root@LZH5:/mnt/d/Software/AI/taichi/2310/wsl_cuda118# python setup.py develop -DCLANG_EXECUTABLE=/usr/bin…

goometasoft updated 11 months ago
2
halide/Halide #7881

Loops get duplicated in partition loops to "simplify" bounda…

The title says well what happens, and I think most of us know that this happens, but here is an example anyway. Here is a loop that computes a 5x5 convolution: ```cpp produce denoise_conv_noisy { …

mcourteaux updated 1 year ago
4
NVIDIA/cutlass #1182

[QST] Example `06_splitK_gemm` error in A100.

**What is your question?** I am a newbee in cutlass, and I'm trying to run the `examples/06_splitK_gemm` on Nvidia A100. I made modifications based on issue #1141, including changing SmArch to cutlas…

KuangjuX updated 1 year ago
4

上一页 1...21 22 23 24 25 26 27...50 下一页

495 results for simt

495 results
for simt