-
[04/10/2024-16:11:31] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/10…
-
As in this code-base direct .`mnn` weight files are provided which are parsed using the custom CPP module and then infered using MNN inference engine, but one of you comments you mentioned of first co…
-
## Bug report
**Describe the bug**
Unable to distribute the example app to Testflight for internal testing. Error:
Runner.app/Frameworks/arm64_libllm_inference_engine.framework does not support t…
-
Hi,
I'm working on OpenVINO with FPGA support.
I can run the example as shown in [Run a Sample Application] section of "https://software.intel.com/en-us/articles/OpenVINO-Install-Linux-FPGA", and …
-
I want to know why the centerhead onnx do not contain decode part. If I make the centerhead decode part into tensorrt *.engine, how it would infulence the inference speed.
-
Hi,
I have finetuned Qwen2-VL using Llama-Factory.
I successfully quantized the fine-tuned model as given
```
from transformers import Qwen2VLProcessor
from auto_gptq import BaseQuantizeC…
-
### Feature request
Pass in `torch_dtype` in model_kwargs, as supported by sentence_transformers when specifying dtype in the infinity_emb v2 cli when InferenceEngine type is torch.
This would all…
-
### Your current environment
running isolated on docker container
### How would you like to use Aphrodite?
I have the following question,
currently nvlink support on new motherboards that …
-
Hi, when i use medusa decoding on trtllm-090 which profiling, error occrued as follows. Could you please help to have a look? Thanks!
If i do not use `--run_profiling`, the inference process is nor…
-
## Description
Hi, I'am using multi-stream to improve TensorRT inference Latency & Throughput. Here' the inference code I modified from TensorRT repo's example. [common_runtime.py](link=https://githu…