-
### What happened + What you expected to happen
Context:
**How severe**: High
**Case**: raycluster + raydata + rayjob to create distributed inference task
**Depends**: python3.10.13, ray2.34.0
…
-
I've converted a model to tensorRT format. And at the inference time, I need to initialize the state of the model with a custom value. I implemented this with pycuda in python inference. but it was no…
-
**Description**
I was unable to build the onnxruntime_backend with OpenVino for Triton Inference Server r22.03 using compatible ONNXRuntime and tensorrt versions (from Triton Inference Server compati…
-
I'd love to see Sentence Transformers getting added into Hoarder for enhanced semantic search capabilities. It could make finding bookmarks much more efficient and user-friendly.
For reference, yo…
-
### System Info
- CPU: amd64
- OS: Debian 12
- GPU: nvidia rtx4000 ada
- GPU driver: 535.161
- TensorRT-LLM version: 0.8
- tensorrtllm_backend version: 0.8
### Who can help?
@kaiyux
…
larme updated
5 months ago
-
I use lightLDA to do new Document inference ,I changed new/Unseen Document to the libsvm file by the old vocabulary dictionary and generate datablock,then i read the mode server_0_table_0 and server_0…
-
Hey there!! 🙏
I am currently working on a project that involves the sending request to the model using flask api and when user sends the request concurrently the model is not able to handle it. Is …
-
https://github.com/triton-inference-server/tensorrtllm_backend/blob/49def341ca37e0db3dc8c80c99da824107a7a938/all_models/inflight_batcher_llm/preprocessing/config.pbtxt#L127
tokenizer_type parameter…
-
I am having some issues with the DeepMreye demo using the exemplary data from the 2 first participants from the sample dataset as instructed in the notebook "deepmreye_example_usage_pretrained_model_w…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/Mozilla-Ocho/llamafile/blob/master/README.…