-
**Description**
In a ensemble pipeline for TensorRT-LLM backend, when we try to propagate data from preprocessing model to the postprocessing model, we get this error **Model 'ensemble' receives inpu…
-
## Description
Tracking c++ inference issue in master/cpp-package
https://github.com/apache/incubator-mxnet/issues/19550#issuecomment-841583103
### Error Message
```
root@6da7899c2de8:/work/mxn…
-
### 软件环境
```Markdown
- paddlepaddle:
- paddlepaddle-gpu: develop最新
- paddlenlp: develop最新
```
### 详细描述
```Markdown
llm/README.md文档:
4.2 静态图推理 章节
不支持加--inference_model导出模型,无法导出支持动态batch的模型
``…
zhjc updated
4 months ago
-
I have successfully train and test second_early_fusion.yaml and second_intermediate_fusion.yaml. However,I meet this error when I test config of second_late_fusion.
**File "/graduation-project/Ope…
-
pretrained.py calls following :
g, lg = Graph.atom_dgl_multigraph(
atoms,
cutoff=float(cutoff),
max_neighbors=max_neighbors,
)
This uses the default value of use_c…
-
Hello team,
We typically use `gather_all_token_logits` to collect the logit tensors for post-processing. Especially for large vocabulary sizes (128 000) this can require a lot of GPU memory. For ex…
-
Hey all!
The video models are all supported in Transformers now and will be part of the v4.42 release. Feel free to check out the model checkpoints [here](https://huggingface.co/collections/llava-h…
-
### Feature request
Enable fused op F.gemv_4bit in F.gemv_4bit backward
### Motivation
The forward and backward in 4bit have same calculations, so I was wondering if we could enable fused op in bac…
-
While computing the Greedy Decoder script getting the error. Can you suggest the type of GPU and the amount of memory required to run the script?
The output of the script while running in spyder the …
-
This is for tracking "Milestone1 : Run Batch request in parallel manner via direct call to trix-engine(~Tizen M2, Aug 30th)" from https://github.com/Samsung/ONE/projects/8
## User scenario
- Use…