auto-quant Search Results

1000+ results
for auto-quant

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

THUDM/GLM-4 #15

是否可以提供int4量化的版本？

我注意到base内的README提到了BF16和INT4两种精度的模型显存占用和生成速度测试情况，但目前只提供了BF16版本的模型。未来是否会官方提供INT4版本的模型？

mjzcng updated 1 month ago
23
vllm-project/vllm #3067

Failed to build from source on ROCm (with pytorch and xforme…

OS: Linux 6.6.17-1-lts HW: AMD 4650G (Renoir), gfx90c SW: torch==2.3.0.dev20240224+rocm5.7, xformers==0.0.23 (both confirmed working). Description of the issue: Following the installation guide…

nayn99 updated 1 week ago
8
vllm-project/vllm #6545

[Feature]: mistralai/Mistral-Nemo-Instruct-2407 support

### 🚀 The feature, motivation and pitch Apparently outperforms Mixtral at a smaller size. Longer context length and multilingual. https://github.com/mistralai/mistral-inference/#deployment for Docke…

bjoernpl updated 2 months ago
6
microsoft/qlib #1756

numpy.core.multiarray failed to import

this project "https://github.com/SJTU-Quant/MASTER" uses qlib to load data. However, when I loaded the data, there is a bug. the bug content is as follows: File qlib\\data\\_libs\\rolling.pyx:1 in…

financialAIer updated 4 months ago
2
carpentries-lab/reviews #17

[Review]: Workflows with Snakemake

### Lesson Title Snakemake for Bioinformatics ### Lesson Repository URL https://github.com/carpentries-incubator/snakemake-novice-bioinformatics ### Lesson Website URL https://carpentries-incubat…

gperu updated 2 months ago
36
h2oai/h2o-llmstudio #710

[FEATURE] add support for multigpu, splitting model across g…

### 🚀 Feature Currently distributed.sh, disable zero3 and disable fsdp, the vram is quite a lot higher than using accelerate+SFTTrainer natively. I believe it is because each gpu is receiving a mod…

Quetzalcohuatl updated 4 months ago
3
huggingface/accelerate #2809

NotImplementedError: Cannot copy out of meta tensor; no data…

When I fine tuning llama2 with deepspeed and qlora on one node and multi GPUs, I used zero3 to partition the model paramters, but it always first load the whole params on each GPU and partition params…

CHNRyan updated 3 months ago
3
vllm-project/vllm #6278

[Bug]: Failed to load `Qwen2-57B-A14B-Instruct-GPTQ-Int4` wi…

### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC …

CrazyboyQCD updated 2 months ago
2
bitsandbytes-foundation/bitsandbytes #834

Huge difference in llama2 parameter count after 4bit loading

After loading the llama2-7b-text model using 4-bit quantization, the total parameter count is reduced to ~3.5B. Is this a bug or the expected behavior. Packages: bitsandbytes => 0.41.1 transforme…

akshayiyer2610 updated 4 months ago
6
mobiusml/hqq #65

Issue with torchao patching with loaded model

Basically, when I quantize a model and patch it to use torchao_int4 ops, it works, but if I then save this model and load it again the patching fails. Am I doing something wrong ? I have been trying t…

rohit-gupta updated 4 months ago
8

上一页 1...76 77 78 79 80 81 82...100 下一页

1000+ results for auto-quant

1000+ results
for auto-quant