OpenRouterTeam / openrouter-runner

Inference engine powering open source models on OpenRouter
https://openrouter.ai
MIT License
574 stars 55 forks source link

perf: bump vllm container cpu memory from 128M to 1024M #53

Closed sambarnes closed 9 months ago

sambarnes commented 9 months ago

Details

previously, the fastapi completion endpoint's memory was bumped to 1024. however, there is a container boundary that the web endpoint crosses when actually doing the generation -- bumping the resources on the other side to measure its impact

Code of Conduct