-
# Dataset
1. Refactor the self cognition dataset to support multi-lingual QAs.
# Megatron PreTrain
1. Support more Megatron models
2. Support dataset split
# Fine-tuning
1. RAG LLM training …
-
I tried the batch inference in XTTS, So I am doing padding till the max text sequence in the batch and also adding the attention mask for this, But for shorter sequences,
I am getting some random…
-
Similar to AI Workflows, we should enable local batch inference.
Let's discuss in this issue what the API should look like and what data format the results are created as.
Request from @2timesja…
-
Hi,
I was looking over the docs, and batch inference was mentioned. I looked at the code, and it is not batch inference. It is sequential inference.
I was really hoping for batch inference becau…
-
### Describe the issue
|batch_size|batch_cost_time|frame_cost_time|
|----|----|----|
|1|207|203|
|2|600|406|
|3|914|627|
|4|1234|855|
|5|1570|1106|
|6|1868|1267|
|7…
-
OpenCV = 4.9
Operating System / Platform = Windows 64 Bit
Compiler = Visual Studio 2022
cuda =11.6
cudnn = 8.6.0
Driver Version = 536.45
GPU PTX4050 6G
Detailed description:
I used the C++ ver…
-
### Question
I used `transformers=4.44.2` , and load the model `llava-hf/llama3-llava-next-8b-hf` with the following code:
```
from transformers import LlavaNextProcessor, LlavaNextForConditionalGe…
-
### System info
GPU: A100
tensorrt 9.3.0.post12.dev1
tensorrt-llm 0.9.0
torch 2.2.2
### Reproduction
```
export MODEL_NAME="llava-1.5-7b-hf"
git clone https://huggingface.co/llava-hf/${MODEL…
-
Sorry, I am new for it.
According to the code in [inference_wizardcoder.py](https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/src/inference_wizardcoder.py), i have created a service and perf…
-
Hi,
If I run something like:
```
def inference_fn(model, x):
y = model(input)
return y
ys = nnx.vmap(inference_fn, in_axes=(None, 0))(model, xs)
```
the random key that is used across th…