batch-inference Search Results

1000+ results
for batch-inference

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/server #7275

[Bug] Model 'ensemble' receives inputs originated from diffe…

**Description** In a ensemble pipeline for TensorRT-LLM backend, when we try to propagate data from preprocessing model to the postprocessing model, we get this error **Model 'ensemble' receives inpu…

michaelnny updated 1 day ago
5
apache/mxnet #20307

[2.0][CPP] C++ inference core dumped with tensorRT enabled

## Description Tracking c++ inference issue in master/cpp-package https://github.com/apache/incubator-mxnet/issues/19550#issuecomment-841583103 ### Error Message ``` root@6da7899c2de8:/work/mxn…

barry-jin updated 3 years ago
2
PaddlePaddle/PaddleNLP #6903

[Docs]:

### 软件环境 ```Markdown - paddlepaddle: - paddlepaddle-gpu: develop最新 - paddlenlp: develop最新 ``` ### 详细描述 ```Markdown llm/README.md文档： 4.2 静态图推理章节不支持加--inference_model导出模型，无法导出支持动态batch的模型 ``…

zhjc updated 4 months ago
1
DerrickXuNu/OpenCOOD #129

A error when running inference.py

I have successfully train and test second_early_fusion.yaml and second_intermediate_fusion.yaml. However,I meet this error when I test config of second_late_fusion. **File "/graduation-project/Ope…

ncut2020tjw updated 3 months ago
1
usnistgov/alignn #148

use_canonize for inference through pretrained.py

pretrained.py calls following : g, lg = Graph.atom_dgl_multigraph( atoms, cutoff=float(cutoff), max_neighbors=max_neighbors, ) This uses the default value of use_c…

ironhammer269 updated 4 months ago
1
NVIDIA/TensorRT-LLM #1419

[FeatureRequest] Gather sparse logprobs

Hello team, We typically use `gather_all_token_logits` to collect the logit tensors for post-processing. Especially for large vocabulary sizes (128 000) this can require a lot of GPU memory. For ex…

Marks101 updated 2 months ago
7
LLaVA-VL/LLaVA-NeXT #79

LLaVa-NeXT-Video is added to 🤗 Transformers!

Hey all! The video models are all supported in Transformers now and will be part of the v4.42 release. Feel free to check out the model checkpoints [here](https://huggingface.co/collections/llava-h…

zucchini-nlp updated 1 week ago
28
bitsandbytes-foundation/bitsandbytes #1233

Is it possible to enable fused op F.gemv_4bit in F.gemv_4bit…

### Feature request Enable fused op F.gemv_4bit in F.gemv_4bit backward ### Motivation The forward and backward in 4bit have same calculations, so I was wondering if we could enable fused op in bac…

jiqing-feng updated 1 month ago
1
murak038/CNN_LSTM_Seq2Seq #1

Greedy Decoder: RuntimeError: CUDA out of memory during

While computing the Greedy Decoder script getting the error. Can you suggest the type of GPU and the amount of memory required to run the script? The output of the script while running in spyder the …

makamkkumar updated 5 years ago
1
Samsung/ONE #9207

[onert] Run Batch request in parallel manner via direct call…

This is for tracking "Milestone1 : Run Batch request in parallel manner via direct call to trix-engine(~Tizen M2, Aug 30th)" from https://github.com/Samsung/ONE/projects/8 ## User scenario - Use…

chunseoklee updated 2 years ago
22

上一页 1...92 93 94 95 96 97 98...100 下一页

1000+ results for batch-inference

1000+ results
for batch-inference