-
The survey discusses the sensitivity of activation quantization and the tolerance of KV cache quantization in the context of post-training quantization (PTQ) for large language models (LLMs). It makes…
pprp updated
5 months ago
-
using checkpoint:
https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/ckptsaug/efficientnet-b0.tar.gz
export_model.py setting:
`python export_model.py
--ckpt_dir=efficientne…
-
Great to see the Tensorflow 2 Object Detect API has been released. One feature I'm very interested in is quantization aware training (as is supported in the Tensorflow 1 version). I'm assuming it's …
-
take the following code as simple example:
> parser = transformers.HfArgumentParser(
> (ModelArguments, DataArguments, TrainingArguments, LoraArguments)
> )
> (
> _m…
-
### System Info
```Shell
- `Accelerate` version: 0.33.0
- Platform: Linux-5.15.133+-x86_64-with-glibc2.35
- `accelerate` bash location: /opt/conda/bin/accelerate
- Python version: 3.10.14
- Nu…
-
### 🚀 Feature request
Quantization is a widely used technique to accelerate models, particularly when using the [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.htm…
-
# Prerequisites
Please answer the following question for yourself before submitting an issue.
- [x] I checked to make sure that this issue has not been filed already.
## 1. The entire URL of …
-
Currently, the TFLite wasi-nn implementation performs quantization if quantization scale and zero-point exist (https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/libraries/was…
CIPop updated
3 months ago
-
## ⚙️ Request New Models
- Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
- Is this model architecture supported by ML…
-
from the issue "https://developer.apple.com/forums/thread/740518 how do we use the computational power of A17 Pro Neural Engine?"
I learn that if i want to inference my mlmodel on my ipad pro with …