-
## 🚀 Feature
Please add Medusa decoding in mlc-llm in C++, we urgently needed it to speedup LLM decoding on mobile device.
refers to: https://github.com/FasterDecoding/Medusa/tree/main
Medusa adds …
-
### Feature request
TGI provides some valuable metrics on model performance and load today. However, there are still a number of missing metrics, the absence of which poses a challenge for orchestr…
-
#798 #2175
tjbck updated
2 months ago
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the [LangGraph](https://langchain-ai.github.io/langgraph/)/LangChain documentation with the integrat…
-
```
{
"messages": [
{
"role": "system",
"content": "You are painter, funny.\n\nYour decisions must always be made independently without seeking user assistance. Play to your str…
-
javascript weekly news
-
Feel free to simply close out this issue if you are not interested but we just implemented QOI image format for VNC to deliver lossless remote desktops using Rust WASM clientside here:
https://githu…
-
### Description
TL;DR When I run a t5x script using a A100-8 GPU machine it is much slower compared to running the same script on a single A100 machine.
There are many available configurations…
-
Hello TensorRT-LLM experts!
I have a question regarding the weird operation of the XQA kernel function supported in NVIDIA's official MLPerf 4.0 version of TensorRT-LLM.
First of all, I want to te…
-
- [ ] [Everything-of-Thoughts-XoT/README.md at main · microsoft/Everything-of-Thoughts-XoT](https://github.com/microsoft/Everything-of-Thoughts-XoT/blob/main/README.md?plain=1)
# Everything of Thou…