-
I've followed a mixture of the tutorial for building Falcon [here](https://github.com/NVIDIA/TensorRT-LLM/tree/release/0.5.0/examples/falcon) and for spinning up on the triton inference server [here](…
-
Hi, it doesn't seem like "stop_words" is respected in the generate endpoint.
I'm getting the same output with and without this field
```
curl -X POST localhost:8000/v2/models/ensemble/generate …
-
# Deploy a Speech-to-Text model
These works are focused on Whisper.
## What's whisper?
Whisper is a Transformer-based model developed by OpenAI, specializing in Speech-to-Text (STT) tasks, also kno…
-
Hello, can the trace also give an example of tensorrt deployment?
-
```
Making all in plugins
DOC Introspecting gobjects
(gst-plugin-scanner:9617): GStreamer-WARNING **: 16:27:49.673: Failed to load plugin '../../ext/r2inference/.libs/libgstinference.so': dlo…
-
# 代码
import gradio as gr
from paddlenlp import Taskflow
import numpy as np
from PIL import Image
import uuid
# 初始化文档智能任务模型
docprompt = Taskflow("document_intelligence")
# 定义模型推理函数
def m…
-
### Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue]…
-
https://github.com/triton-inference-server/
- [x] Build Triton Docker image with support for FasterTransformer backend for Fusion etc.
- [x] convert h2oGPT models to format that Triton understands h…
-
Thanks for all the work on ReScript!
I'm running into an issue with the VS Code extension freezing when trying to view the type of `fold` in the following code (simplified from an actual language A…
-
Many of the current issues concern inference (#87 #86 #84 #85, ...)
At the risk of delaying the solving, wanted to start some discussion about rewriting inference with the current gempyor object st…