Closed xISSAx closed 1 year ago
Thanks @xISSAx. You're right, LlamaChat sets the mlock
parameter to false
always, since this was touted as a big performance improvement over the previous versions (which for large models I think is true)?
I need to do some more investigation into this, but I was definitely thinking of adding a switch for this. Perhaps you're right, maybe this should be enabled by default for a good FTUE, but configurable if people need it.
Added in v1.2.0
Greetings, Love the application and UX!
I noticed Llama cpp running on my M1 was flushing the memory during and after each generation causing slower-than-expected outputs. This can be fixed by passing "-mlock" argument, which massively boosts Mac M1 performance by locking the model into the memory.
However, currently, LlamaChat has a similar issue, and I believe it can be fixed by passing a simple '-mlock' argument. In fact, I suggest leaving it ON by default for a seamless beginner's experience for M1s.
Moreover, please also consider an advanced feature to allow users to change the parameters.