Open HorrorBest opened 7 months ago
What does the report of nvidia-smi give you regarding VRAM usage?
oops, you said windows :)
It seems to me that the model you've chosen may need more VRAM than your system can provide. On my linux system the webui is using very little VRAM (My current total VRAM usage is sitting at 133MB for all applications with webui running but no work being done).
Are you linking the two GPUs together? Which specific version of the model you mentioned about are you using?
I'm not the developer, just someone poking around myself.
Hi there and thanks for testing. If you want dynamic layers sharing between ram and vram then use ctransformers binding and set how many layers to put on gpu. Fusing two gpus requires triton and i think it is not supported on windows. (I'm not sure)
Expected Behavior
Model being loaded
Current Behavior
Model Not loading
Steps to Reproduce
I don't know actually, I installed LOLLMs with the win installer bat file, and chose the option with CUDA.I Downloaded the HuggingFace Zoo Binding. and I tired to manually place a model and that didn't work (no need to worry about that now) So I downloaded one "https://huggingface.co/TheBloke/neural-chat-7B-v3-2-GPTQ" using the UI and I it didn't load and I got the following error in the terminal
Possible Solution
Reduce memory usage from the UI or config file ?? Im new so I don't know where everything should be
Context
I have 2 GPUs Nvidia Quadro M2000M with 4gb Vram and a Inter GPU and a Intel i7 6th gen I think