-
RNN cannot be jit compiled, see error below:
```
Detected unsupported operations when trying to compile graph __inference_one_step_on_data_993[] on XLA_GPU_JIT: CudnnRNN (No registered 'CudnnRNN' Op…
-
When I'm increasing the batch size to 2, I receive an error in boxlist_ops.py.
This is for batch = 2:
this is shape of proposals before going into cat_boxlist method
len - 2
[BoxList(num_boxe…
-
I tried to run yolo models in caffe (tools are available online to do the conversion.). However, I noticed that the inference time is much longer in caffe comparing to darknet framework. On my Quadro …
-
#### Tested on CF versions `3.34.0` and `3.42.0`
----
I executed `WPI` on [NJR](https://zenodo.org/records/6314162) benchmarks to infer annotations for [Nullness Checker](https://checkerframewor…
-
```
import threading
import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(y)
return a + b
opt_foo1 = torch.compile(foo, mode="max-autotune")
threads = []
for _ in rang…
-
Thanks, I wanted to try your triton version. But I only have 8 GB RAM.
The GPTQ Cuda versions works (7B model). Your version (the ppl script) crashes with CUDA OOM).
Is that to be expected or c…
-
Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals d…
-
### Describe the issue
Build a class to create the model and inference. In initialition, created a random data and run one time.
But when run other data, first inference is so slow, Why?
If wait…
-
I have issues with NPU for dlstreamer in docker container. The build with NPU driver installation is complete without issue. But when I run the dlstreamer pipeline, there was error and the pipeline un…
-
Hi dear:
I tried awq quantization on codellama-13b according to https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama. After testing, it was very slow, 1.5 times slower than the floa…