-
**Environment:**
- Istio Version: 1.3.6
- Knative Version: 0.15.0
- KFServing Version: 0.4
I followed the [metrics installation](https://github.com/kubeflow/kfserving/tree/v0.4.0/docs/samples/…
-
hey, first of all, thanks for creating this amazing library!
I'm following your T5 implementation with trt,
https://github.com/ELS-RD/transformer-deploy/blob/b52850dce004212225edcaa7b80fccc311398…
Ki6an updated
2 years ago
-
Support for Velocity proxy.
-
I've followed the instruction
https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/baichuan.md
to run Baichuan2-7b-Chat.
But for exactly the same engine, the outputs are …
-
### Describe the issue
In its current state, the CMakeLists.txt of abseil inconditionally bypasses CMake's target longevity rules, and rewrites the file `options.h` every time CMake's consider the …
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N…
-
I'm currently using `sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py` with all the underlying c++ code for modified_beam_search (`RnntConformerModel`, `StreamingModifiedBeamSearc…
-
### Bug description
The unit test of the 2-stage recommender system pipeline is shaky due to multiple reasons:
- user_id sent to triton inference server does not exist in FEAST storage
- FIASS cann…
-
It would be great to be able to install pytriton on Macs for ease-of-development. Even with the lack of CUDA support for Macs, being able to develop using only the CPU would be a real time saver.
A…
-
I have a conformer CTC model built with the NeMo framework (https://github.com/NVIDIA/NeMo), which can be normally converted and deployed with Riva 2.11.0. However, if I convert the same NeMo file to …