[DO NOT MERGE] feat: generate cpu kernels using KA

LuxDL / LuxLib.jl

Backend for Lux.jl

MIT License

7 stars 0 forks source link

Currently, this is a performance disaster. Locally, I see like slowdowns of atleast 5-10x. Let's see the numbers on the dedicated benchmarks.

The main pro of this approach is that the maintenance burden significantly goes down. Now how can we solve this? (Probably this is better off as a KA Issue)

Finer control of CPU backend from KA:

Allowing us to control the number of threads (if any): https://github.com/JuliaGPU/KernelAbstractions.jl/issues/507
@simd and @simd ivdep loop info. Either by default or by supplying to the backend object -- https://github.com/JuliaGPU/KernelAbstractions.jl/pull/436
Alternate threading: KA is being used inside "core" operations. As such we are unlikely (if not impossible) to call other operations that make use of threading. Hence, having the option to use "cheaper threads" (Polyester.jl) would be a great addition

Codecov Report

Attention: Patch coverage is 8.33333% with 11 lines in your changes missing coverage. Please review.

Project coverage is 68.91%. Comparing base (c185f04) to head (f056a2d).

Files	Patch %	Lines
src/impl/batchnorm.jl	0.00%	6 Missing :warning:
src/impl/groupnorm.jl	0.00%	4 Missing :warning:
src/impl/normalization.jl	50.00%	1 Missing :warning:

:exclamation: There is a different number of reports uploaded between BASE (c185f04) and HEAD (f056a2d). Click for more details.

HEAD has 19 uploads less than BASE
| Flag | BASE (c185f04) | HEAD (f056a2d) | |------|------|------| ||37|18|

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #136 +/- ## =========================================== - Coverage 83.93% 68.91% -15.02% =========================================== Files 37 36 -1 Lines 1867 1586 -281 =========================================== - Hits 1567 1093 -474 - Misses 300 493 +193 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

LuxDL / LuxLib.jl

[DO NOT MERGE] feat: generate cpu kernels using KA #136

Codecov Report