-
Thanks for the nice work! I met the following issue when I run `python train_nerf.py data/fox --workspace trial_nerf`. Do you have any thoughts? Many thanks for your help!
```
Traceback (most rece…
-
## Description
I could build and run C++ encoder samples successfully as described in the README. Output sample:
```
$ ./bin/encoder_sample 32 12 32 12 64 0 0 0 0
Device Tesla V100-SXM2-16GB
D…
-
Hi! I'm trying to simulate the [volta_tensorop_gemm.cu](https://github.com/NVIDIA/cutlass/blob/master/examples/07_volta_tensorop_gemm/volta_tensorop_gemm.cu) in cutlass.
I directly use the docker i…
-
We propose a solution for TensorCore CodeGen with significant transparency, flexibility and usability. In this solution, the algorithm description and schedule of TensorCore CodeGen is no different th…
-
**Describe the bug**
A clear and concise description of what the bug is.
Hi,
I just installed CUDA for the first time on a clean julia environment for julia `v1.6-rc1`
and `] test CUDA` fails.
…
-
Replace all of our usages of [cuDNN](https://developer.nvidia.com/cudnn) with [cuTLASS](https://github.com/NVIDIA/cutlass). This would have several advantages:
- [cuTLASS has a BSD-3 license](https…
-
So far we have a text printer for relay. which allows us to print an IRModule into text format. On the TIR side, we still relies on the ReprPrinter.
This is issue is for upgrading the text printe…
-
## Introduce block_hierarchy
Hardware chips usually have more than one storage and execution hierarchy. As for NVIDIA GPUs, they have GPU blocks(GPU SMs), warp and CUDA cores with global, shared and …
-
I test this package ,it show me this error
my julia is 1.4.0-rc1.0
```
julia> using CUDAdrv, CUDAnative, CuArrays
┌ Warning: Incompatibility detected between CUDA and LLVM 8.0+; disabling debug i…
-
Hi,
I am trying to see what the best performance for row-major SGEMM for 4 specific input sizes is, when only using plain CUDA (no tensor cores, no intrinsics). This is useful to me, because I want…