quantizing Search Results

1000+ results
for quantizing

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ggerganov/llama.cpp #3148

Certain 70B Q4_0 quants outputting gibberish (other quant fo…

Hi guys I've just had reports that two specific Q4_0 70B models are outputting gibberish, and I've confirmed the same. Example file with this issue: https://huggingface.co/TheBloke/Spicyboros-70…

TheBloke updated 5 months ago
19
ggerganov/llama.cpp #964

Tracking: LoRA

Here are some outstanding issues for LoRA: - [x] Base implementation (https://github.com/ggerganov/llama.cpp/pull/820) - [ ] Improve LoRA application time with SIMD (AVX, AVX2) (https://github.com/g…

jon-chuang updated 5 months ago
5
endomorphosis/ipfs_transformers_py #1

support for wasmedge models?

at @onefact we have been using wasm, but this won't work for the encoder-only or encoder-decoder models i've built (e.g. http://arxiv.org/abs/1904.05342). that's because the wasm vm is for the cpu (ha…

jaanli updated 3 weeks ago
36
google/gemma.cpp #221

OFF Topic, Request for Open-Sourcing Google Gemini Flash

Dear Google AI Team, I wish to express my strong interest in seeing Google Gemini Flash released to the open-source community. As a developer and AI enthusiast, I have been incredibly impressed wi…

0wwafa updated 1 week ago
32
intel/intel-extension-for-pytorch #672

RuntimeError: XPU out of memory.

### Describe the issue I am trying to replicate the following : [https://intel.github.io/intel-extension-for-pytorch/llm/llama3/xpu/](url) . While running the `python run_generation_gpu_woq_for_llama…

rohitpreddy07 updated 1 month ago
9
intel/neural-compressor #1508

Will you support Intel Arc?

I’m curious if you will support Arc, neural compressor would particularly benefit those platforms! Thanks!

nathanodle updated 4 months ago
3
ggerganov/llama.cpp #6838

GGML_ASSERT: llama.cpp:14101: (qs.n_attention_wv == 0 || qs.…

quantize after convert, problem occurs: > ➜ llama ./llama.cpp/quantize ./chinese-llama-2-7b-hf/ggml-model-f16.gguf ./chinese-llama-2-7b-hf/ggml-model-q4_0.gguf 2 main: build = 2695 (bca40e98) ma…

darcy1990 updated 3 months ago
2
ggerganov/llama.cpp #5818

quantize to F32/F16/Q8_0 can result in a Q6_K output tensor

Running quantize with a target dtype of F32, F16, or Q8_0 can result in a Q6_K output tensor without --pure (ref https://github.com/ggerganov/llama.cpp/pull/5631#issuecomment-1965055798). This is surp…

cebtenzzre updated 6 months ago
1
vllm-project/vllm #2243

AWQ (Support Mixtral): Implement new `modules_to_not_convert…

AutoAWQ now supports Mixtral on the main branch. It requires that we do not quantize the `gate` in the model. To prevent quantizing and loading it as a quantized linear layer, you have to skip loading…

casper-hansen updated 5 months ago
2
microsoft/onnxruntime #21476

quant_pre_process failed on NonMaxSuppression

### Describe the issue We are trying to quantize our proprietary model based on RetinaNet using TensorRT's model optimization library. The following warning was raised: **"Please consider running pre…

korkland updated 3 weeks ago
2

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for quantizing

1000+ results
for quantizing