-
/kind bug
**What steps did you take and what happened:**
- deploy serving.kserve.io/v1beta1/InferenceService with custom container predictor
- send grpc message with below command:
[inputs_samp…
-
Hello, I have a suggestion for a notebook -- an **example of a cuml trained model being exported so it can be served by TensorRT.**
More information on TensorRT:
- https://docs.nvidia.com/deeplear…
-
Hi, I am able to reproduce building and running the model locally via TensorRT-LLM.
I build using:
```
python3 build.py --model_dir /finetune-gpt-neox/models--meta-llama--Llama-2-7b-hf/snapsho…
-
[The PR](https://github.com/twilio-samples/speech-assistant-openai-realtime-api-python/pull/13)
```
await openai_ws.send(json.dumps(initial_conversation_item))
await openai_ws.send(json.d…
-
### Initial Checks
- [ ] I have searched GitHub for a duplicate issue and I'm sure this is something new
- [ ] I have read and followed [the docs & demos](https://github.com/modelscope/modelscope-age…
-
### What happened + What you expected to happen
Context:
**How severe**: High
**Case**: raycluster + raydata + rayjob to create distributed inference task
**Depends**: python3.10.13, ray2.34.0
…
-
**Kibana version:**
v8.13.2
**Elasticsearch version:**
v8.13.2
**Server OS version:**
cloud
**Browser version:**
Version 124.0.6367.60 (Official Build) (arm64)
**Browser OS version:**
14.4.1…
-
I've been having a tough time figuring this out. On coderealtime.com I am seeing an issue where the AI chatbot connection appears to stop responding after a period of time.
Curious if anyone is see…
-
**Issue Description:**
During a graceful shutdown of Triton Server, we've observed the following behavior:
- Triton Server is hosting both Model A and Model B.
- Model B can make calls to Model…
-
Hi,
I am trying to use MMpose in the Nvidia triton server but it does not support PyTorch model, it supports torchscript and ONNX, and a few others. So, I have converted MMpose mobilenetv2 model to…