High CPU and RAM Usage with Bolt.New

Describe the bug

When using language models that fit within my GPU (e.g., through Aider or OpenWebUI), they run smoothly, utilizing only the GPU and delivering optimal performance. However, when attempting to use the same models through Bolt.New, I encounter significant performance issues:

as soon as i send anything through bolt.new the ollama server spikes in both ram and CPU usage its as if im running a model that is a lot bigger not the same small model i am running.

Link to the Bolt URL that caused the error

I used pnpm run dev to test it out before deploying

Steps to reproduce

Load a language model that fits within the GPU using Bolt.New. Observe CPU and RAM usage in the system monitor. Note the token generation speed compared to other platforms.

Expected behavior

The model should run smoothly, leveraging the GPU for computation, similar to its performance in other environments like Aider or OpenWebUI.

Screen Recording / Screenshot

No response

Platform

OS: [e.g. macOS, Windows, Linux]
Browser: [e.g. Chrome, Safari, Firefox]
Version: [e.g. 91.1]

Additional context

No response

coleam00 / bolt.new-any-llm