Closed kpratik41 closed 1 month ago
I saw your response to other questions and figured that mistral has 32 layers. (set gpu_layers=32). Let me first investigate what other parameters need to be changed to make it run on GPU. If i still have questions then I will get back to you. Thanks
I set the GPU layers to 32 (gpu_layers=32) and I get CUDA error 700. It says illegal memory access was encountered. It never loads the prompt.
Update: Alright, so I played with the layers a bit. The illegal memory access comes and goes. I need to look into what exactly is going on there. I did find that I couldn't support 32 layers. I was maxing out vram on the initial startup. The max I got was 24 layers but it wasn't consistent. I have had illegal memory access terminate my sessions at 12 layers.
Update 2: I couldn't get it stable with even 4 layers so I set it back to zero. Is the memory ever getting released?
setting gpu_layers=32 or anything other than 0 causes errors for me too. I cannot get the GPU to be used
Is the memory ever getting released?
If running in a stable manner, then the model stays in memory. You can click on the clear cache button to release the model. Also usually it gets released if the code crashes. You can easily verify it by checking nvidia-smi on linux or checking resources in windows. How much vram do you have available?
setting gpu_layers=32 or anything other than 0 causes errors for me too. I cannot get the GPU to be used
Maybe installing pip install ctransformers[cuda]
might help. Which errors are you encountering?
Unfortunately I only have 16GB but I can get GPU offload on the same models in PrivateGPT and LM Studio so while I would like to have more I think there is a way to manage it. I just don't know enough yet to make it happen.
Well I have 8GB and it works on my end.
no doubt in my mind it is something on my end. I just installed ctransformers[cuda] as you suggested and I didnt have nvida-cublas-cu12 so that may have played a bit in it. I will follow up after I have had a chance to determine if that fixed the errors I was seeing. Thank you for the suggestion
Seems to be working a lot better. I started with 16 layers and it has made it through what was tanking me before. I'll work my way up and report back if I come across any of the same issues. Thank you Leon.
You're welcome. Which OS are you actually on?
Windows at the moment but I will eventually clone/rebuild this in a Debian system. I really appreciate you putting all of this together and taking the time to produce the YT video in a code along structure. This is the densest digestible project I have come across and I have learned a lot in a very short amount of time.
I also have struggles on windows but could not identify for sure where the exact problem is.
I really appreciate you putting all of this together and taking the time to produce the YT video in a code along structure. This is the densest digestible project I have come across and I have learned a lot in a very short amount of time.
Thank you, glad I could help.
after doing pip install ctransformers[cuda]
it worked in windows cmd for a tiny PDF. still didnt work in WSL though. still surprisingly slow for tiny PDF and only utilizes around 30% of GPU with gpu_layers set to 50. I have 16GB VRAM and 128GB RAM.
also wanted to note: im currently trying to load a short book in a PDF, but it's been loading for over 30 minutes with no success. It's only 8MB
But the tiny PDF does work! Thanks!
i ran 'pip install ctransformers[cuda]' and also set gpu layers to 24...i am running it on my dedicated gpu that is rtx 4050...i am still getting a slow response time..
@DhruvDhabalia are you on Windows? For some reason I also have a very slow response time on windows, more investigation is necessary to find the problem area. No problem on linux tho.
@DhruvDhabalia are you on Windows? For some reason I also have a very slow response time on windows, more investigation is necessary to find the problem area. No problem on linux tho.
yes i am on windows..for some reason it seems like the layers and dgpu have no effect on the model flow speed...possibly smthng in code is restricting everything on windows update: i am trying to run profiling tools to see the hiccup
@DhruvDhabalia are you on Windows? For some reason I also have a very slow response time on windows, more investigation is necessary to find the problem area. No problem on linux tho.
yes i am on windows..for some reason it seems like the layers and dgpu have no effect on the model flow speed...possibly smthng in code is restricting everything on windows update: i am trying to run profiling tools to see the hiccup
I ran the profiler....max time is consumed by main() in app.py and run() in llm_chains.py...i dont know if this will help..either ways i am attaching the output ss
after doing
pip install ctransformers[cuda]
it worked in windows cmd for a tiny PDF. still didnt work in WSL though. still surprisingly slow for tiny PDF and only utilizes around 30% of GPU with gpu_layers set to 50. I have 16GB VRAM and 128GB RAM.also wanted to note: im currently trying to load a short book in a PDF, but it's been loading for over 30 minutes with no success. It's only 8MB
But the tiny PDF does work! Thanks!
im also facing simmilar issue , did this problem get resolved . loading pdf books and not utilizing full gpu
same issue i am facing too! loading the pdf is really slow i am working on windows any updates or solution please
This seems to be a windows related problem, I would suggest to work on linux based systems if trying to run AI and in my personal opinion when coding in general.
Hello,
Is this code written to run only on CPU. I dont think GPU is being used and the response time is very slow.
If it is written to be run on CPU for now then can you suggest the changes (device, gpu_layers) that I would need to make to make it run on GPU?