-
### ClearML serving design document v2.0
**Goal: Create a simple interface to serve multiple models with scalable serving engines on top of Kubernetes**
Design Diagram (edit [here](https://excalid…
-
## Goal
- Jan should be able to seamlessly move from Nitro to cortex.cpp
- What is the scope of change?
- Different inference extensions? (e.g. `nitro-extension`, and `cortex-extension`?)
-…
-
**Env:**
- Container: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3
- TensorRT-LLM release: 0.7.1
- TRT-LLM backend repo tag: v0.7.1
- Model: Llama-2-70b
- tritonserver deployed on 2 A10…
-
## Description
Build engines for SDXL.
Then init pipeline. And do several runs. At the first run I get good picture, but the second run gives all grey image.
I've added controlnet and ip-adapte…
-
## Goal
- Jan has a mobile client that runs local models
-
When converting a `tensorflow.keras.layers.LayerNormalization` layer to ONNX, `tf2onnx` currently decomposes layer normalizations into rather complex subgraphs with batch norms and more basic building…
-
### Problem Statement
Nowadays remote model servers like AWS SageMaker, BedRock, or OpenAI, Cohere, etc all support batch predict APIs, which allow users to send large amount of synchronous request…
-
**Description**
According to the Framework matrix (https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html#framework-matrix-2024), 24.05 is supposed to support TensorRT 10.0.6.1. Th…
-
### 问题描述 Issue Description
在飞腾2000+,昆仑芯R200,麒麟V10环境下编译paddlepaddle报错,报错信息如下:
/usr/bin/ld: /usr/lib64/libcrypto.a(sha1-armv8.o): relocation R_AARCH64_PREL64 against symbol `OPENSSL_armcap_P' which ma…
-
### What do you want to change?
We would like to improve the type inference of parameters compared to constants only.
Currently, it is absolutely inferred that `interface{}`.
https://play.sqlc.…