Closed cuihantao closed 1 year ago
Thans for your attention! But I don't think there's anything to do with the ndarray and the compiler. Vectorization in the compiler is oriented towards more general numerical computations, without the need for scenario-specific functionality such as ndarray.
About the project status, some previous discussions in Rust's internal forum: https://internals.rust-lang.org/t/mir-optimization-pass-that-implements-auto-vectorization/16360
In general, the community thinks that automatic vectorization should be the work of LLVM and not in rustc.
Thank you for letting me know!
I later used Vec
from the stdlib instead of ndarray
for element-wise multiplication. To my surprise, the compiler vectorizes the code very well. Code as simple as below just works with SIMD.
pub fn g_update(&mut self) -> &Self{
for (dest, p1, p2, p3, p4, p5) in izip!(
&mut self.dest,
&self.p1,
&self.p2,
&self.p3,
&self.p4,
&self.p5
) {
*dest = p1 * p2 * p3 * p4 * p5;
}
self
}
The asm reads below
.LBB20_9: // major loop for packs of 4
movupd xmm0, xmmword, ptr, [r8, +, 8*rbx]
movupd xmm1, xmmword, ptr, [r8, +, 8*rbx, +, 16]
movupd xmm2, xmmword, ptr, [r9, +, 8*rbx]
mulpd xmm2, xmm0
movupd xmm0, xmmword, ptr, [r9, +, 8*rbx, +, 16]
mulpd xmm0, xmm1
movupd xmm1, xmmword, ptr, [r10, +, 8*rbx]
mulpd xmm1, xmm2
movupd xmm2, xmmword, ptr, [r10, +, 8*rbx, +, 16]
mulpd xmm2, xmm0
movupd xmm0, xmmword, ptr, [rdi, +, 8*rbx]
mulpd xmm0, xmm1
movupd xmm1, xmmword, ptr, [rdi, +, 8*rbx, +, 16]
mulpd xmm1, xmm2
movupd xmm2, xmmword, ptr, [rsi, +, 8*rbx]
mulpd xmm2, xmm0
movupd xmm0, xmmword, ptr, [rsi, +, 8*rbx, +, 16]
mulpd xmm0, xmm1
movupd xmmword, ptr, [r14, +, 8*rbx], xmm2
movupd xmmword, ptr, [r14, +, 8*rbx, +, 16], xmm0
add rbx, 4
cmp r11, rbx
jne .LBB20_9
cmp r15, r11
je .LBB20_16
.LBB20_11: // remaining entries
mov rcx, r11
or rcx, 1
test r15b, 1
je .LBB20_13
movsd xmm0, qword, ptr, [r8, +, 8*r11]
mulsd xmm0, qword, ptr, [r9, +, 8*r11]
mulsd xmm0, qword, ptr, [r10, +, 8*r11]
mulsd xmm0, qword, ptr, [rdi, +, 8*r11]
mulsd xmm0, qword, ptr, [rsi, +, 8*r11]
movsd qword, ptr, [r14, +, 8*r11], xmm0
mov r11, rcx
I haven't gotten ndarray
to vectorize other than using Zip
or azip!
, which is limited to six iterants. The standard Vec
just works fine for my purpose. Posting it for reference, but more than likely you are already aware of it.
Hello!
I'm new to rust and found this repository from here: https://github.com/rust-ndarray/ndarray/issues/46. Are the changes in this repository going to make into the compiler? It will save a significant amount of efforts in vectorization.
Thanks