1) Do SAT models give better results compared to Hugging Face models?
2) Why do the models on this demo page provide better responses compared to running HF Models in 8bit and 4bit locally? Is it because the demopage models are not quantized? or is it because HF model is less efficient than SAT model? I am making assumptions, please clarify.
3) Also, when attempting to run SAT models locally, I encountered a "Killed" error with 4-bit quantization and a "CUDA out of memory error" without quantization. Is there a way to load SAT models in 8-bit or 4-bit on a single RTX 3090 GPU?
4) And by the way your demo on huggingface is not working. I open the demo, submit a prompt and then it displays error - “Timeout! Please wait a few minutes and retry”. Therefore I am unable to compare difference between HF and SAT version in terms of quality.
[2024-04-06 20:18:44,522] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.2.2+cu121)
Python 3.10.12 (you have 3.10.14)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
[2024-04-06 20:18:47,398] [INFO] building CogVLMModel model ...
[2024-04-06 20:18:47,402] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-04-06 20:18:47,403] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-04-06 20:18:47,403] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
[2024-04-06 20:18:55,409] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 17639685376
[2024-04-06 20:19:00,248] [INFO] [RANK 0] global rank 0 is loading checkpoint models/cogvlm-chat-v1.1/1/mp_rank_00_model_states.pt
Killed
Expected behavior / 期待表现
Summary: The SAT model didn't work on my computer, so I tried the HF model. But the HF model didn't give good results, so now I'm trying to get the SAT model to work. Also, to make sure that I have properly run the HF model on my computer, i checked the demo on huggingface to compare my results, but the demo is also not working.
Goal: I am trying to produce same results that I get on this demo , but on my local machine.
System Info / 系統信息
OS - WSL2 (Ubuntu 22.04) 32GB RAM Rtx 3090 24GB Rtx 3060 12GB Cuda - 12.1 Pytorch - 2.2.2+cu121 Transformers - 4.31.0 Python 3.10.14
Who can help? / 谁可以帮助到您?
Hello
Few clarifications please:
1) Do SAT models give better results compared to Hugging Face models?
2) Why do the models on this demo page provide better responses compared to running HF Models in 8bit and 4bit locally? Is it because the demopage models are not quantized? or is it because HF model is less efficient than SAT model? I am making assumptions, please clarify.
3) Also, when attempting to run SAT models locally, I encountered a "Killed" error with 4-bit quantization and a "CUDA out of memory error" without quantization. Is there a way to load SAT models in 8-bit or 4-bit on a single RTX 3090 GPU?
4) And by the way your demo on huggingface is not working. I open the demo, submit a prompt and then it displays error - “Timeout! Please wait a few minutes and retry”. Therefore I am unable to compare difference between HF and SAT version in terms of quality.
Information / 问题信息
Reproduction / 复现过程
python cli_demo_sat.py --from_pretrained models/cogvlm-chat-v1.1 --version chat --bf16 --stream_chat --quant 4
When i run the above code, I get this error : -
Expected behavior / 期待表现
Summary: The SAT model didn't work on my computer, so I tried the HF model. But the HF model didn't give good results, so now I'm trying to get the SAT model to work. Also, to make sure that I have properly run the HF model on my computer, i checked the demo on huggingface to compare my results, but the demo is also not working.
Goal: I am trying to produce same results that I get on this demo , but on my local machine.