Closed erikpro008 closed 1 week ago
It seems like if you have enough hard drive memory it will work, but always requiring this amount of memory makes it harder to use.
I am unable to run this because out of memory it gets over 64 GB of ram, which makes it crash while processing, I thought I would not require so much with the ISQ feature, but it seems to be a lie or misleading information, why does it require so much ram ?
@erikpro008 it loads the model onto the CPU first, and then applies ISQ directly onto the GPU (or CPU). It seems like you are using swap space now, which should work. Please let me know if you have any further questions!
Describe the bug
running this in terminal : "./mistralrs-server --isq Q4K -i plain -m microsoft/Phi-3.5-MoE-instruct -a phi3.5moe"
I am unable to run this because out of memory it gets over 64 GB of ram, which makes it crash while processing, I thought I would not require so much with the ISQ feature, but it seems to be a lie or misleading information, why does it require so much ram ?
is there a way to fix this and make it work, because else I might need to wait for llama.cpp support.
i am using a MacBook Pro M1 Max with 64 GB Ram.
Latest commit or version
Which commit or version you ran with.
Latest version that is currently available.