gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #69506

[performance] a profiler util to show a rough break-down of …

## 🚀 Feature At https://pytorch.slack.com/archives/C3PDTEV8E/p1638511540268500 we were discussing how depending on the model type the different bf16/amp or tf32 modes may or may not do much speed i…

stas00 updated 2 years ago
1
pytorch/ao #64

[New Feature] CUTLASS kernels for w4a8 quantization

We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here https://github.com/pytorch-labs/ao/issues/47) For this to run efficiently on the GPU we'd need kernel support for W4A8…

supriyar updated 4 months ago
4
google/XNNPACK #4701

Unable to replicate RPi0 tests

I am trying to verify the test results of the Raspberry Pi Zero W as listed in the table under the [Raspberry Pi section](https://github.com/google/XNNPACK#raspberry-pi) in the [README.md](https://git…

samveen updated 1 year ago
3
flame/blis #629

Excuse me, is the performance evaluation of small/skinny mat…

if ( bli_does_notrans( transa ) ) bli_obj_create( dt, m, k, rs_a, cs_a, &a ); else bli_obj_create( dt, k, m, cs_a, rs_a, &a ); if ( bli_does_notrans( transb ) ) bli_obj_cre…

ProgrammerWLY updated 2 years ago
1
casper-hansen/AutoAWQ #4

Experiment with implementing AWQ for BERT models

If we can speed up the BERT model, we will significantly increase the throughput of many cases. Experiment with SentenceTransformers first.

casper-hansen updated 7 months ago
3
NVIDIA/TransformerEngine #764

Replacing nn.Linear w/ te.Linear FP8 convergence issue

Hi, I'm seeing higher losses using `te.Linear` over `nn.Linear` directly in transformer models such as Llama which I assume is expected due to the nature of FP8. However, I don't see a loss inc…

viclzhu updated 3 months ago
9
numpy/numpy #18669

np.matmul with `out` parameter slower than np.dot in some ca…

For 2D inputs, `np.matmul` and `np.dot` are semantically the same, but I've found that in some cases `matmul` can be much slower even though the documentation for `np.dot` says `matmul` is preferred f…

bmerry updated 3 years ago
11
NVIDIA/nv-wavenet #82

[manyblock mode] why store embedding in "xt_sh" if we are go…

in function "nv_wavenet_persistent_cur", we add values from "embedPrev" and "embedCur". Then we save the values in "xt_sh" and the same values in "xt". Then, before the GEMM, we update the values in "…

isaacleeai updated 5 years ago
1
pytorch/pytorch #68105

Some type combinations of cublas gemm are not supported when…

## 🐛 Bug `torch.mm/addmm` are calling `cublasGemmEx` under the hood. However, they are type combinations that are claimed to be non-supported by pytorch when they should work fine: Example: …

GuillaumeLeclerc updated 2 years ago
2
NVIDIA/TensorRT-LLM #931

Could not find any implementation for node {ForeignNode[QWen…

### System Info cuda12.2 torch2.1 ### Who can help? @byshiue ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X] An officially supported task in th…

whk6688 updated 6 months ago
4

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for gemm

1000+ results
for gemm