-
我注意到base内的README提到了BF16和INT4两种精度的模型显存占用和生成速度测试情况,但目前只提供了BF16版本的模型。未来是否会官方提供INT4版本的模型?
-
OS: Linux 6.6.17-1-lts
HW: AMD 4650G (Renoir), gfx90c
SW: torch==2.3.0.dev20240224+rocm5.7, xformers==0.0.23 (both confirmed working).
Description of the issue: Following the installation guide…
-
### 🚀 The feature, motivation and pitch
Apparently outperforms Mixtral at a smaller size. Longer context length and multilingual.
https://github.com/mistralai/mistral-inference/#deployment for Docke…
-
this project "https://github.com/SJTU-Quant/MASTER" uses qlib to load data. However, when I loaded the data, there is a bug.
the bug content is as follows:
File qlib\\data\\_libs\\rolling.pyx:1 in…
-
### Lesson Title
Snakemake for Bioinformatics
### Lesson Repository URL
https://github.com/carpentries-incubator/snakemake-novice-bioinformatics
### Lesson Website URL
https://carpentries-incubat…
gperu updated
2 months ago
-
### 🚀 Feature
Currently distributed.sh, disable zero3 and disable fsdp, the vram is quite a lot higher than using accelerate+SFTTrainer natively. I believe it is because each gpu is receiving a mod…
-
When I fine tuning llama2 with deepspeed and qlora on one node and multi GPUs, I used zero3 to partition the model paramters, but it always first load the whole params on each GPU and partition params…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC …
-
After loading the llama2-7b-text model using 4-bit quantization, the total parameter count is reduced to ~3.5B. Is this a bug or the expected behavior.
Packages:
bitsandbytes => 0.41.1
transforme…
-
Basically, when I quantize a model and patch it to use torchao_int4 ops, it works, but if I then save this model and load it again the patching fails. Am I doing something wrong ? I have been trying t…