Open ahmedashraf443 opened 2 days ago
i second this too. right now i am 1hour into generating an app with phi3.5 model on my M1 mac 8gb ram. this is the only model which responded fast and could actually execute and preview code. the others only behaved like chat-gpt offering me solutions that i could implement myself....but 1h into it and still going it hasnt finished generating the package.json file... Is there a way to smooth this out ?
Describe the bug
When using language models that fit within my GPU (e.g., through Aider or OpenWebUI), they run smoothly, utilizing only the GPU and delivering optimal performance. However, when attempting to use the same models through Bolt.New, I encounter significant performance issues:
as soon as i send anything through bolt.new the ollama server spikes in both ram and CPU usage its as if im running a model that is a lot bigger not the same small model i am running.
Link to the Bolt URL that caused the error
I used pnpm run dev to test it out before deploying
Steps to reproduce
Load a language model that fits within the GPU using Bolt.New. Observe CPU and RAM usage in the system monitor. Note the token generation speed compared to other platforms.
Expected behavior
The model should run smoothly, leveraging the GPU for computation, similar to its performance in other environments like Aider or OpenWebUI.
Screen Recording / Screenshot
No response
Platform
Additional context
No response