Open LeonNerd opened 9 months ago
I attempted to implement it, but it turns out to be much slower compared to what we currently have in place.
I attempted to implement it, but it turns out to be much slower compared to what we currently have in place.
hi,It seems that the implementation is somewhat challenging, but optimizing convolutions is indeed necessary. Could you share your project code? I would like to continue your work based on it.
See my ggml fork i have a winograd
branch.
I had implemented it in cpu and cuda
See my ggml fork i have a
winograd
branch.I had implemented it in cpu and cuda
Thank you. It looks like you've done a lot of work.What do you think is the reason for the slow speed? In theory, the speed should be faster.
I think it's because I'm not using SIMD instructions (hardware acceleration), just vanilla code. In CUDA, it's because I'm not distributing the computation well. The advantage of using Winograd is that it consumes 60% less memory than the current convolution implementation.
Amazing,when I finish what I'm doing,continue with your work.
hi,Which version of sd.cpp were you previously using? Is it in the forked project that you have?
hi,I'm from sd.cpp.I am very interested in the Winograd convolution algorithm you mentioned, and I'd like to know how its progress is going. I wonder why it's no longer on the sd.cpp to-do list.