tensorcore Search Results

788 results
for tensorcore

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/chat-ui #1375

Chat-UI is not following prompt - producing unknown complete…

Oogabooga text-generation-web-ui engine used for inference (prompts directly input into the oogabooga ui produce normal results but chat-ui is doing something weird as below), Mongodb setup _**Prom…

cody151 updated 1 month ago
9
apache/mxnet #9543

RNN Op Changes (was: Variable Length Support for cuDNN RNN)

We want to add support for variable-length sequences to the cuDNN RNN operator, as we cannot 'fake' support via masking for LSTM (the cuDNN operator does not return a history of cell states) and bidir…

sbodenstein updated 6 years ago
19
iree-org/iree #12194

Lack of warp tiling for CUDA GEMM codegen.

I am studying the TensorCore GEMM codegen of IREE. I notice a big performance gap between IREE and cuBlas. For example, when [M, N, K] is [1024, 512, 1024], I use the following script to run GEMM: ``…

JamesTheZ updated 1 year ago
2
zhangjun/zhangjun.github.io #21

GPU

# Turing |Brand Name|GPU Architecture|Tensor Core|NVIDIA CUDA® Cores|TensorFLOPS|Single-Precision|Double-Precision|Mixed-Precision(FP16/FP32)|INT8|INT4|GPU Memory|Interconnect Bandwidth|System Interf…

zhangjun updated 8 months ago
11
JuliaGPU/CUDA.jl #807

Implement `mapslices` without scalar iteration

When I use `mapslices(f,a,dims)` to manipulate CuArray, a warning appears. It reminds me that using scalar operations on the GPU is inefficient. ```julia a=CUDA.rand(3,4,5) b=CUDA.rand(2,3) maps…

yeruoforever updated 2 years ago
4
triton-lang/triton #197

Add fallback fp16 support for non-tensor-core GPUs

I tried running the matrix multiplication example from the tutorial. I am using 1060 GPU, driver version=465.31 and cuda 11.3 [log.txt](https://github.com/openai/triton/files/6959248/log.txt)

hyuen updated 1 year ago
2
tensorflow/recommenders #579

[Question] expected throughput of cloud tpu on embedding loo…

Hi, I read this blog recently https://cloud.google.com/blog/topics/developers-practitioners/building-large-scale-recommenders-using-cloud-tpus, very interested in it and wondering the raw performan…

pmixer updated 1 year ago
3
facebookresearch/xformers #918

What is the difference between the 4 implementations of FMHA…

Hi all, I'm new to xformers, I'm learning the `examples/llama_inference/generate.py` file. I traced it here: ```python def _memory_efficient_attention_forward( inp: Inputs, op: Optional[Type…

sleepwalker2017 updated 3 months ago
7
tlc-pack/tvm-tensorir #432

[TIR][Schedule] affine binding and more

Recently @Hzfengsy brought up a question regarding affine binding and related schedule primitives. After brief discussions, I put my thoughts here for further discussions. ## Intro case ```pytho…

spectrometerHBH updated 3 years ago
3
fpgaminer/GPTQ-triton #7

1-bit acceleration support

Hi, really good work, and appreciate it a lot. I am curious whether Triton can support 1-bit acceleration for MMA. Also the further application to 1-bit GPTQ?

NicoNico6 updated 1 year ago
2

上一页 1...3 4 5 6 7 8 9...79 下一页

788 results for tensorcore

788 results
for tensorcore