GPU not being used - Githubissues

Jiraxys commented 1 year ago

Hello,

I cannot get the LLM to use my GPU instead of my CPU. i tried multiple models but it does not work. how do i fix this?

MODEL_ID = "TheBloke/Llama-2-13B-chat-GPTQ" MODEL_BASENAME = "gptq_model-4bit-128g.safetensors" (recently changed the model to MythoMax-L2-13B-GPTQ, still no change)

GPU: rtx 3060 TI 8GB RAM: 16 gb

Log: CUDA SETUP: CUDA runtime path found: C:\Users\User\miniconda3\Library\bin\cudart64_110.dll CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll... 2023-08-19 17:33:58,635 - INFO - run_localGPT.py:181 - Running on: cuda 2023-08-19 17:33:58,635 - INFO - run_localGPT.py:182 - Display Source Documents set to: False 2023-08-19 17:33:59,184 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large load INSTRUCTOR_Transformer max_seq_length 512 2023-08-19 17:34:02,280 - INFO - init.py:88 - Running Chroma using direct local API. 2023-08-19 17:34:02,301 - WARNING - init.py:43 - Using embedded DuckDB with persistence: data will be stored in: C:\Users\User\Desktop\PGPT\LGPT\localGPT/DB 2023-08-19 17:34:02,325 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations 2023-08-19 17:34:02,349 - INFO - json_impl.py:45 - Using orjson library for writing JSON byte strings 2023-08-19 17:34:02,455 - INFO - duckdb.py:460 - loaded in 11 embeddings 2023-08-19 17:34:02,456 - INFO - duckdb.py:472 - loaded in 1 collections 2023-08-19 17:34:02,457 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection 2023-08-19 17:34:02,458 - INFO - run_localGPT.py:46 - Loading Model: TheBloke/MythoMax-L2-13B-GPTQ, on: cuda 2023-08-19 17:34:02,458 - INFO - run_localGPT.py:47 - This action can take a few minutes! 2023-08-19 17:34:02,458 - INFO - run_localGPT.py:69 - Using AutoGPTQForCausalLM for quantized models 2023-08-19 17:34:02,958 - INFO - run_localGPT.py:76 - Tokenizer loaded 2023-08-19 17:34:05,925 - INFO - _base.py:727 - lm_head not been quantized, will be ignored when make_quant. 2023-08-19 17:34:29,931 - WARNING - fused_llama_mlp.py:306 - skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet. The model 'LlamaGPTQForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', (and so on). 2023-08-19 17:34:30,986 - INFO - run_localGPT.py:128 - Local LLM Loaded

ariorostami commented 1 year ago

Seconded

tonykhoapro commented 1 year ago

Same as you

ciliamadani commented 10 months ago

Same issue

PromtEngineer / localGPT

GPU not being used #391