-
Firstly, I saved finetuned LORA model as merged_16bit on my huggingface repo. And i have adapter_config.json and adapter_model.safetensor inside my repo. Now when trying to load with AutoAWQForCausalL…
-
I am running torchao: 0.5 and torch: '2.5.0a0+b465a5843b.nv24.09' on an NVIDIA A6000 ADA card (sm89) which supports FP8.
I ran the generate.py code from the benchmark:
python generate.py --c…
-
### System Info
```
pip install git+https://github.com/huggingface/transformers.git
pip install tokenizers==0.20.0
pip install accelerate==0.34.2
pip install git+https://github.com/huggingface/tr…
-
### The model to consider.
https://huggingface.co/Tele-AI/TeleChat-12B
### The closest model vllm already supports.
qwen2
### What's your difficulty of supporting the model you want?
I …
-
Hiya,
Comfy-Org put out an FP8 scaled version of Mochi. Curious to try what kind of quality can be gotten out of it, but it doesn't seem compatible with this repo.
https://huggingface.co/Comfy-…
-
**Bug description**
I using ***MetaGPT ver 0.8.1*** but when use RAG with method **SimpleEngine.from_docs** have error ***ValueError: Creator not registered for key: LLMType.OLLAMA***
**…
-
I encountered an issue while trying to quantize the YOLOv8s model using the Ryzen AI quantizer. Below are the details of the error:
### Error Message:
```
No CUDA runtime is found, using CUDA_HOM…
-
### 🐛 Describe the bug
The following script attempts to fuse two custom operations together into a single custom op. One of the original ops, plus the fused op have multiple outputs. The resultin…
-
Median_abs_epsilon is calculated here:
https://github.com/Proteobench/ProteoBench/blob/8ed5b5ad9588b5b8b10c3b0cfd9ec284be43a59a/proteobench/datapoint/quant_datapoint.py#L124-L127
So is actually…
-
### Description of feature
I am using the alevin-fry quantitation method. For downstream analysis, I am mainly interested in the final count matrix, e.g.
1. The content of the `af_quant/alevin di…