-
### Describe the issue
Hey
We are planning to add GPU inference (using Mirosoft.ML.OnnxRuntime.Gpu 1.17.0) as an option in our C# software.
However, when switching from the CPU ONNX runtime to th…
-
## ❓ Questions and Help
Traceback (most recent call last):
File "webcam.py", line 80, in
main()
File "webcam.py", line 71, in main
composite = coco_demo.run_on_opencv_image(img)
F…
-
https://www.modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary
使用如上模型在显卡A770上运行,得到如下数据:
32in32 out peak GPU mem:3.1G
2048in512 out peak GPU mem:7.4G
4096in1024 out peak GPU mem:11.6G
8192in2048 o…
-
[benchmarks.zip](https://github.com/CVC4/CVC4/files/6325239/benchmarks.zip)
[Statistics.csv](https://github.com/CVC4/CVC4/files/6325242/Statistics.csv)
Hi. These days, I collected the benchmarks w…
ConfZ updated
2 years ago
-
Hi all, I'm new to xformers, I'm learning the `examples/llama_inference/generate.py` file.
I traced it here:
```python
def _memory_efficient_attention_forward(
inp: Inputs, op: Optional[Type…
-
zcuuu updated
2 years ago
-
@regisss would it make sense to add task specific evaluators. for example with `automatic-speech-recognition`, as I did it manually when I did whisper's benchmark.
-
This pr made the `alloc.strings` benchmark in BaseBenchmarks 2000% slower in min wall time, use 42.25% more memory and increase allocations by 63.53%.
Also, it worsened compile performance with `i…
-
My configuration is as follows:
- Arch linux, fully up to date, nvidia drivers installed and configured correctly, cuda installed and configured correctly, the works
- Podman image build using a c…
-
While testing #1129 on the ADS side on `bireli`, we found this weird behavior. Downgrading CUDA from 11.1 to 10.2 speeds up inference (almost twice as fast).
`bireli` has a [GeForce GTX TITAN X](ht…