auto-quant Search Results

1000+ results
for auto-quant

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

casper-hansen/AutoAWQ #612

Can you give me some advices about parameters setting?

My use case and GPU: model: Qwen2-72B-Instruct max_token_len (input+output): 20000 gpus: 4xA100 when I use code from https://github.com/casper-hansen/AutoAWQ/blob/main/docs/example…

lzcchl updated 2 weeks ago
1
sophgo/tpu-mlir #186

model_deploy.py --chip cv183x 报错

--chip bm1684x 可以通过，cv183x 报错 ==---------------------------== GmemAllocator use FitFirstAssign [Success]: tpuc-opt yolov5s_cv183x_f16_tpu.mlir --mlir-disable-threading --strip-io-quant="quant_inpu…

micro-step updated 3 weeks ago
1
microsoft/BitBLAS #157

Different outputs based on weight shape

Hello, While implementing a BitBlas Linear layer, I noticed something weird ``` import bitblas from bitblas.cache import global_operator_cache, get_database_path from bitblas import auto_detect…

MekkCyber updated 4 weeks ago
17
casper-hansen/AutoAWQ #436

AttributeError: 'LlavaForConditionalGeneration' object has n…

Hi @casper-hansen, I keep getting this error when trying to quantize my custom LLaVA model: ```Traceback (most recent call last): File "/mainfs/lyceum/kzl1m20/LLaVA/quant.py", line 9, in m…

kzleong updated 1 month ago
4
goldmansachs/gs-quant #297

Code Assist AI to help build custom solutions with GS Quant

Dear maintainers, greetings from [CommandDash](https://commanddash.io)! We are a tool to turn docs and examples of your library into a code generation AI agent which **helps devs directly generate…

samyakkkk updated 1 month ago
2
wejoncy/QLLM #139

llama-2-7b-chat gptq quantize & onnx export fail: RuntimeErr…

Thanks for sharing work for LLM quantization & onnx export. I follow the script in '[Convert to onnx model](https://github.com/wejoncy/QLLM?tab=readme-ov-file#convert-to-onnx-model)' section, and g…

lifelongeeek updated 1 week ago
1
nbasyl/LLM-FP4 #10

Quantification time is too long

@nbasyl Sorry, When i use the follwing scripts, the quantification time is over ten days: ``` MODEL_ADDR=huggyllama/llama-7b HF_ENDPOINT=https://hf-mirror.com export CUDA_VISIBLE_DEVICES=0,1,2,3 …

qxpBlog updated 3 weeks ago
1
ModelTC/llmc #97

failed to save quantizationed model

``` save: save_trans: True save_lightllm : False save_fake: False save_path: /extra_data/mali36/llmc/models/ ``` when I used the above config, I get a 16G model, when I used …

LiMa-cas updated 4 days ago
17
ModelTC/llmc #56

fail to start awq quantized model with lightllm on qwen2-7b-…

awq config ``` base: seed: &seed 42 model: type: Qwen2 path: /models/Qwen2-7B-Instruct tokenizer_mode: slow torch_dtype: auto calib: name: pileval download: Fals…

gloritygithub11 updated 1 month ago
10
pytorch/ao #252

[Tracker] WIP features for torchao 0.3

Focus - benchmarking, documentation, tutorials, prototype to beta Due date: June 13 2024 ### Spillover [from 0.2.0](https://github.com/pytorch/ao/issues/132) - [x] Consolidating workflows to …

supriyar updated 3 months ago
6

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for auto-quant

1000+ results
for auto-quant