Open richox opened 4 days ago
I would have expected the compiler to be doing this for us, I wonder if the capacity checks in extend_from_slice are throwing this off. Given we've already determined the buffer capacity, perhaps we could use unchecked variants?
i'm not sure how we can use it in stable rust.
I think this is probably a hard blocker, we really don't want to be depending on unstable features if we can avoid it
prefetch_read_data
is exposed llvm intrinsics and it's not intended to be used by users, but prefetch_read_data(_, 3)
could be implemented with arch-specific intrinsics.
pub fn prefetch_read_data_locality_3(data: *const ()) {
#[cfg(target_arch = "x86")]
unsafe {
core::arch::x86::_mm_prefetch(data.cast(), core::arch::x86::_MM_HINT_T0);
}
#[cfg(target_arch = "x86_64")]
unsafe {
core::arch::x86_64::_mm_prefetch(data.cast(), core::arch::x86_64::_MM_HINT_T0);
}}
// `core::arch::aarch64::_prefetch` is unstable,
// tracked in https://github.com/rust-lang/rust/issues/117217.
// Inline assembly may be used here.
}
Is your feature request related to a problem or challenge? Please describe what you are trying to do. memory prefetching is widely used in randomly accessing array items, which is very suitable in some cases of take/interleave kernels.
i have done some fast benchmark, it shows for completely random input, interleave with prefetching gains 2x performance with the current interleave implement:
benchmark code:
Describe the solution you'd like
interleave_with_memory_prefetching
so we don't break current implementation.prefetch_read_data
is still unstable, i'm not sure how we can use it in stable rust.Describe alternatives you've considered
Additional context