-
# configurability
* [done] support delayed vs dynamic scaling type, configurable separately for activations/weights/gradients
* [planned] support rowwise/blockwise scaling granularity, configurabl…
-
I was following the Livebook provided [in the docs](https://github.com/acalejos/exgboost/blob/main/notebooks/compiled_benchmarks.livemd#:~:text=gemm_predict%20%3D%20EXGBoost.compile(model%2C%20strateg…
-
----
- [为什么寄存器比内存快? - 阮一峰的网络日志](http://www.ruanyifeng.com/blog/2013/10/register.html)
- [计算机中内存、cache和寄存器之间的关系及区别 - hellojoy的博客 - CSDN博客](https://blog.csdn.net/hellojoy/article/details/54744231)
…
-
https://github.com/onnx/tensorflow-onnx/blob/1528091559b5246207c09cccc45a33e671b1f662/tf2onnx/rewriter/gemm_rewriter.py#L74
-
### System Info
- CPU architecture: x86_64
- CPU memory size: 128G
- GPU name: NVIDIA GeForce GTX 1660S
- GPU memory size: 6G
- TensorRT-LLM branch: main
- TensorRT-LLM commit: 9691e12
- Contai…
gyr66 updated
6 hours ago
-
I see there are two sets of APIs to do a gemm using cutlass. The two are https://github.com/NVIDIA/cutlass/blob/main/media/docs/quickstart.md#launching-a-gemm-kernel-in-cuda and https://github.com/NVI…
-
### System Info
- GPU Name: NVIDIA GeForce RTX 3080 Ti
- System Ram: 65GB
- TensorRT-LLM branch `rel`
### Who can help?
@Tracin
@byshiue
### Information
- [ ] The official example scripts
- [X…
-
# 1. Description:
Enable hipBLASLt as an optional backend for MIOpen GEMM kernels.
For this first implementation we are proposing:
- Enable hipBLASLt as an option when using the environment…
-
I want to perform inference on quantized LLAMA (W8A16) on ARM-v9 (with SVE) using oneDNN. The LLAMA weights are per-group quantized.
Based on my understanding, I need to prepack the weights to redu…
-
I'll beautify this once I get hold of Azure storage.
I have attached [gemma_7b.mlir](https://storage.googleapis.com/shark_tank/dan/Gemma/gemma_7b.mlir) along with [gemma weights](https://storage.go…