-
Hello, I want to know whether it accelerate the inference. Recently,I try to accelerate the inference of siamrpn. I try to use fp16 instead of fp32. It is said that fp16 is twice as fast as fp32.
It…
-
Setup
Python version 3.11
Windows Machine
pip install ragchecker
python -m spacy download en_core_web_sm
Its seems like there is trouble connecting with Azure OpenAI or utilising it. I used the…
-
## Description
I have two different module and convert to trt. when I run them in Serial. the cost time of only infer:
```
//10 times
do_infer >> cost 400.60 msec. //warn-up
do_infer >> cost 42.22 …
-
I use vllm to accelerate the large model of qwen, mainly qwen7B/qwen14B. Two issues were found during the testing of the large model.
1) Compared to using vllm qwen7B/qwen14B acceleration, the …
-
Dear Vitis AI Team,
I am writing to express my appreciation for the comprehensive suite of tools and resources that Vitis AI provides. The integration of optimized IP, tools, libraries, and models …
-
Hi There, it's unclear if Yggdrasil supports GPU or TPU acceleration. It seems like if you do fine tuning in JAX maybe it's possible when the model is converted to a JAX function? But it's not clear i…
-
### The bug
I am just looking at my logs because of an issue I am having with facial recognition, these errors are unrelated as they happened during the night, but I wanted to draw some attention to …
-
### Describe the issue
Hello,
I use the float16 tool to convert the FP32 model to the FP16 model and use ONNXRuntime-GPU 1.13.1 to inference.
I found that many models cannot obtain inference acce…
-
I would appreciate if anyone can help with the following problem when using the converted GGUF for inference.
I found that inferencing with llama-cpp generates a different result from inferencing …
-
Feasibility:
The NNFusion project need some flag models to prove the usability, we choose Bert as one of the models.
Target:
1. Improve NNFusion's inference effectiveness on Transformer/Bert;
2…