-
When using vLLM for offline batch prediction, I found a significant decrease in GPU utilization during long-term running. As shown in the graph below, the utilization rate is around 60-70% at 00:00, b…
WrRan updated
9 hours ago
-
**Is your feature request related to a problem? Please describe.**
I was working on issue #7868 using alternative ways for storing messages, one based on `ChatMessageStack` that is simply a stack tha…
-
# Bitcoin-Price-Predictor
## Short description of package/script
- The notebook demonstrates the prediction of the bitcoin price by the neural network model.
- We are using long short term mem…
-
While our [draft charter](https://www.w3.org/2023/03/proposed-webmachinelearning-charter.html) says that the group:
> priority on building blocks required by well-known model architectures such as re…
-
We can get performance improvements (both wrt. memory and time) by moving to asyncio and aiohttp. We can let our internal APIs be async, and then use something like [this](https://gist.github.com/erle…
-
### Problem
If you use the neovim terminal and zoom the terminal out too far. Then when you run a command with a large horizontal output, you will get a SIGSEGV (out of bounds memory access).
I'm …
-
Check out the [newly built-in Superbooga extension](https://github.com/oobabooga/text-generation-webui/blob/main/docs/Extensions.md#built-in-extensions) and its parent.
I'm pretty sure the API does…
-
### **Advantages of JSON-Based Tenant Loading**
1. **Simplicity**:
- Easier to implement for small-scale applications or during the development phase.
- No need to set up and manage a databas…
-
Hi, thank you for your great work!
I am using the `inference.py` script for processing multiple videos sequentially. Currently, I have to reload the model to reset the long term memory for each new…
-
Hey,
thanks for sharing your exciting work!
I have a question regarding a minor thing in the memory pruning logic.
As far as I understood, the weights for finding relevant features are masked t…