auto-quant Search Results

1000+ results
for auto-quant

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #902

Support for Falcon 7B: HF to TRT weight Conversion fails

### System Info CPU:X_86_64 GPU: A10 OS: Ubuntu 22.04 ### Who can help? @Tracin @byshiue please help. ### Information - [X] The official example scripts - [ ] My own modified script…

amir1m updated 2 days ago
5
NVIDIA/TensorRT-LLM #1814

llama3-70b int8+kv8 convert checkpoint failed on v0.10.0 bra…

### System Info - CPU architecture: x86_64 - GPU properties - GPU name: NVIDIA A100 - GPU memory size: 40G - Libraries - TensorRT-LLM branch or tag: v0.10.0 - Container used: yes, `ma…

NaNAGISaSA updated 4 days ago
4
OpenBMB/MiniCPM-V #643

[BUG] <title> Inference error. Replacing the LLM part with L…

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an existing ans…

CCRss updated 1 month ago
6
AutoGPTQ/AutoGPTQ #350

Benchmark each GEMM/GEMV kernels independently

Hi, I tried QuantLinear from qlinear_cuda under auto_gptq.nn_moduldes.qlinear but its performance is low with skinny matmul (i.e. matmul shapes at token gen). Its perf is even worse than fp32 for e…

stephen-youn updated 1 year ago
2
airockchip/rknn-llm #112

Build model failed!模型构建失败！Optimizing model直接Killed

运行在容器中 running in a container 使用gpu会报显存不足 use gpu ```shell ERROR: Model running Error: CUDA out of memory. Tried to allocate 2.37 GiB. GPU 0 has a total capacty of 23.69 GiB of which 2.03 GiB is fr…

xiangzhangpang updated 1 week ago
4
COMBINE-lab/simpleaf #158

[feature request] 10x chemistry autodetection

A recurring feature request — provide automatic chemistry detection, at least in the case where we know that the input data is 10x. This would look something like passing `-c auto10x` and `simpleaf` …

rob-p updated 1 month ago
3
huggingface/optimum-habana #1438

quantization FP8 error

### System Info ```shell optimum-habana 1.14.0.dev0 HL-SMI Version: hl-1.18.0-fw-53.1.1.1 Driver Version: 1.18.0-ee698fb ``` ### Information - [X] The off…

aitss2017 updated 4 weeks ago
1
Rupesh-rkgit/FineTuning-and-Inference-Llama2 #1

Llama 2 7B model Inference time issue

hi, How do i improve the inference time of my Llama2 7B model?.... i used BitsAndBytesConfig also but this does not seem to fasten the inference time! code: `name = "meta-llama/Llama-2-7b-cha…

Rahu218 updated 1 year ago
1
MeetKai/functionary #226

medium based on llama 3.1

would be really nice to have a functionary version of llama 3.1 70b/8b!

themrzmaster updated 1 month ago
8
bitsandbytes-foundation/bitsandbytes #1306

Regarding bnb import error

### System Info Ubuntu ### Reproduction model_id = "google/gemma-2b" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bflo…

Mubashirshariq updated 3 months ago
2

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for auto-quant

1000+ results
for auto-quant