Significantly speeds up isdiag and related checks, especially on nearly-triangular inputs:
julia> x = collect(UpperTriangular(rand(2048,2048)));
julia> y = cu(x);
julia> @benchmark istriu(x)
BenchmarkTools.Trial: 5467 samples with 1 evaluation.
Range (min … max): 897.042 μs … 1.307 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 912.342 μs ┊ GC (median): 0.00%
Time (mean ± σ): 912.978 μs ± 7.722 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▆▆▄▄▄▅▅█▇▅▃▃▅▅▃▃▂▁▁ ▂
▃▁▁▃▁▃▁▃▃▄▆▅▅▄▇▇████████████████████████▇▇█▇▇▇▇▇▆▅▆▆▄▇▅▅▅▆▆ █
897 μs Histogram: log(frequency) by time 933 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark istriu(y)
BenchmarkTools.Trial: 52 samples with 1 evaluation.
Range (min … max): 68.163 ms … 167.157 ms ┊ GC (min … max): 0.00% … 9.36%
Time (median): 91.300 ms ┊ GC (median): 0.00%
Time (mean ± σ): 96.385 ms ± 22.002 ms ┊ GC (mean ± σ): 1.73% ± 3.23%
█ ▃█▃ ▃▃█ █ ▃▃ ▃ █ ▃▃ ▃
▇▁▇█▇███▇███▁█▇██▇▇█▁█▇▇██▇▁▇▁█▁▁▁▁▁▁▁▁▁▇▁▁▇▇▁▁▁▁▁▁▁▁▁▁▇▁▁▁▇ ▁
68.2 ms Histogram: frequency by time 158 ms <
Memory estimate: 5.76 MiB, allocs estimate: 130855.
julia> @benchmark istriu(y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 35.099 μs … 1.205 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 53.339 μs ┊ GC (median): 0.00%
Time (mean ± σ): 53.877 μs ± 15.619 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▆▄▄▇▃▆▆▅▅▅█▅▃▆▃▄▄▃▂▁ ▁ ▂ ▁
▂▅▅▆▄▄▅▄▅▅▅▅▄▆▄▅▅▄▇██████████████████████████▇█▇██▇▇▇▆▅▅▇▄▃ ▅
35.1 μs Histogram: frequency by time 69.9 μs <
Memory estimate: 4.17 KiB, allocs estimate: 80.
For completely non-triangular (i.e. random) inputs this will perform worse, because the generic implementation checks row by row, but I guess that's an edge case.
Significantly speeds up
isdiag
and related checks, especially on nearly-triangular inputs:For completely non-triangular (i.e. random) inputs this will perform worse, because the generic implementation checks row by row, but I guess that's an edge case.