-
hi, thank you for your open source. I have a few questions about the reasoning of quantitative models.
(1) if for the model with only W8A8 quantization, but kv cache does not quantize, whether the fo…
-
### What version of `drizzle-orm` are you using?
0.27.2
### What version of `drizzle-kit` are you using?
0.19.12
### Describe the Bug
My prepared statement is correctly returning the data I would…
-
RNN cannot be jit compiled, see error below:
```
Detected unsupported operations when trying to compile graph __inference_one_step_on_data_993[] on XLA_GPU_JIT: CudnnRNN (No registered 'CudnnRNN' Op…
-
## ❓ Questions and Help
Hi,
Please I cannot find where the inference weights used for maskrcnn in demo.ipynb are stored. Also, in a do a training on coco, where are my weights saved and where is …
-
Now we have tflite model without any optimizations. Please, add some optimizations and corresponding benchmarks for it.
-
### 问题确认 Search before asking
- [x] 我已经查询[历史issue](https://github.com/PaddlePaddle/PaddleDetection/issues),没有报过同样bug。I have searched the [issues](https://github.com/PaddlePaddle/PaddleDetection/issue…
-
Thanks, I wanted to try your triton version. But I only have 8 GB RAM.
The GPTQ Cuda versions works (7B model). Your version (the ppl script) crashes with CUDA OOM).
Is that to be expected or c…
-
```
import threading
import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(y)
return a + b
opt_foo1 = torch.compile(foo, mode="max-autotune")
threads = []
for _ in rang…
-
### Describe the issue
Build a class to create the model and inference. In initialition, created a random data and run one time.
But when run other data, first inference is so slow, Why?
If wait…
-
When I'm increasing the batch size to 2, I receive an error in boxlist_ops.py.
This is for batch = 2:
this is shape of proposals before going into cat_boxlist method
len - 2
[BoxList(num_boxe…