-
### Your current environment
```text
k8s 1.31 using vllm-openai:latest
```
### How would you like to use vllm
I am currently running the QWEN model with 1 GPU with the below manifest
`…
-
For example, caching latest storage of popular contracts (like ERC20 ETH, USDC etc) should give a decent boost to call/estimateFee/simluate requests that node gets
-
Hi, cool project. Since the underlying Moka project describes itself like:
> Thread-safe, highly concurrent in-memory cache implementations:
>
> * Synchronous caches that can be shared across OS…
-
**Is your feature request related to a problem? Please describe.**
There's been some appetite to perform the memory cache check ahead of the interceptor chain. This is a relatively big behaviour chan…
-
Hey there! 👋
I hope you’re doing well! I wanted to reach out because I'm facing a bit of a challenge after installing the `friendsofsymfony/jsrouting-bundle` in my Symfony 6.4 project. I've tested …
-
### Motivation.
KV cache compaction (i.e., token dropping) can significantly reduce memory footprint in llm serving (especially for long generation and large batch size workloads). The plan is to sup…
-
Currently, cached request/response pairs are written to memory (https://github.com/cashubtc/cdk/pull/361). It would be great if the mint could use a dedicated Redis caching service for this (as well o…
-
### Describe the issue:
Hi, I have a 2 hr recording of 384 channels (1 probe, ~200GB file size). I'm trying to run `run_kilosort` with this file (using `DEFAULT_SETTINGS`). During the "Computing dr…
-
Run and close a worker in a loop and you'll see this happening.
![image](https://github.com/denoland/deno_core/assets/1609021/717df77c-8db9-44e8-a3b0-8fe97454dcc8)
![image](https://github.com/de…
-
Recently, our production web environments began failing due to OOM (Out of Memory) errors. Upon analyzing the heap dumps, we noticed a large number of 33-character strings, all prefixed with a t. It t…