Open Lathanao opened 2 months ago
Try smaller version of TinyLlama, Q8 instead of F32: TinyLlama-1.1B-Chat-v1.0.Q8_0.llamafile
Can you try llamafile-0.8.1 which was just released and tell me if it works?
Works perfectly, and it is far faster than before! Thank you.
Meaculpa, above, I make working a model with a lower quantization formats. And now, I am not able to run the file again without error.
So I downloaded many models. -Meta-Llama-3-8B-Instruct.F16.llamafile -> doeasn't load -Meta-Llama-3-8B-Instruct.Q2_K.llamafile -> SIGSEGV -Model/Meta-Llama-3-8B-Instruct.Q8_0.llamafile -> doeasn't load -Model/Phi-3-mini-4k-instruct.Q8_0.llamafile -> doeasn't load -Model/TinyLlama-1.1B-Chat-v1.0.F16.llamafile -> SIGSEGV -Model/TinyLlama-1.1B-Chat-v1.0.F32.llamafile -> doeasn't load -Model/TinyLlama-1.1B-Chat-v1.0.Q8_0.llamafile -> SIGSEGV
I reboot my machine and make test again. And the model what was working for me this morning (Model/TinyLlama-1.1B-Chat-v1.0.F16.llamafile), now is everytime in SIGSEGV. No way to make it working again.
The SIGSEGV issue has been report there #378
I am trying to make working with GPU Tinyllama with:
But it seem not possible to allocate 66.50 MB of memory on my card, even if I just boot the machine without any use of the GPU before.
Here the error:
I have the cuda in this version:
Here the spec of my machine.
Is there a way to solve that?