-
One of the issues faced during SDXL support (https://github.com/openxla/iree/pull/16854) was the missing support for operations added in LinalgExt on all codegen backends i.e, CPU, SPIRV and LLVMGPU.
…
-
Please use the [caffe-users list](https://groups.google.com/forum/#!forum/caffe-users) for usage, installation, or modeling questions, or other requests for help.
_Do not post such requests to Issues…
-
Hi!
I have run the F(2x2, 3x3) you have provided in this repo, and I get the definitely right result using Winograd convolution algorithm.
Then, I want to try F(4x4, 3x3), so I change three tr…
-
Is it possible to implement Winograd Convolution with 8 bit weights and activations? The intermediate transformations cause overflows which results in the loss of accuracy of the overall CNN. Is anyon…
-
(Lots of red herrings here, see https://github.com/iree-org/iree/issues/17635#issuecomment-2163644331 for latest issue description)
---
Splitting off of https://github.com/llvm/torch-mlir/issu…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
Val
…
-
Each configuration should be controlled separately. Group convolution needs to be double checked. Then 16/16 F(2,3) and F(3,3) to be tested on a network level. I guess those could be enabled for norma…
-
Hi, I came across your winograd convolution article at https://antkillerfarm.github.io/dl%20acceleration/2019/07/19/DL_acceleration.html
May I know what are ß0, ß1 and ß2 ?
![winograd_antkillerf…
-
hi, @merrymercy
I am working on winograd on cuda.
I found that batched MM in your winograd is slow in nvida architecure. I guest this is because when C is large, it could not use parallel power of …
-
Here are transforms for Winograd F(4x4, 5x5). That means a 5x5 kernel with a 4x4 output tile. I imagine the code should be a relatively simple adaptation of F(6x6, 3x3), because both algorithms use 8x…