Closed farnoy closed 4 years ago
Hi!
The fact the compilers really wants to call ::new()
may be caused by the fact that ::new
is not marked as #[inline]
there. Adding the issing #[inline]
may fix this.
Also, is there a better way to initialize from packed memory with safe rust?
We could add a .from_slice_unaligned()
to use instead of ::new
. And perhaps it would make sense to add some kind of .from_iterator(iter, default)
which fills the lanes of the f32x4
with the iterator content, or default
when the iterator does not yield enough values to fill all the lanes.
Thanks for the quick response!
I'll try the #[inline]
idea soon. I'm new to these optimizations, do they only make sense for libraries where you want inlining across crates? Does the annotation also affect same-crate but different module function calls?
What would be the contract for from_slice_unaligned()
? That sounds like reading from &[u8]
where I'm interested in &[Self::Element]
, which would have to be aligned?
I'll try the #[inline] idea soon. I'm new to these optimizations, do they only make sense for libraries where you want inlining across crates? Does the annotation also affect same-crate but different module function calls?
The #[inline]
is mostly for inline across crates, yes. I think it can also affect inlining within the same crate when the compiler uses multiple codegen units.
What would be the contract for from_slice_unaligned()? That sounds like reading from &[u8] where I'm interested in &[Self::Element], which would have to be aligned?
That would be reading from &[Self::Element]
. It would basically just call the corresponding from_slice_unaligned function from packed_simd
. It is unaligned because we have no guarantee that &slice[0]
is aligned to, e.g., a 16-byte boundary (i.e. align_of::<f32x4>()
).
Inlining helps, nice catch!
That would be reading from &[Self::Element]. It would basically just call the corresponding from_slice_unaligned function from packed_simd. It is unaligned because we have no guarantee that &slice[0] is aligned to, e.g., a 16-byte boundary (i.e. align_of::
()).
Got it, I forgot that SIMD loads have stricter alignment requirements. In my experiments, rustc usually outputted the aligned versions of instructions, but I guess it's seen how the allocation is done to begin with.
All of this sounds good to me. I don't love the fact that these functions can panic. Will send a PR shortly
I'm trying to initialize a
Simd<f32x4>
value from memory, but the safe way of doing it does a function call.On simba
0.2.0
, using the following rust code:Analyzing the assembly, I'm seeing the initialization for
f32x4::new()
work like this:So the compiler seems to really want to call this
Simd::new
method, despite not using the results (xmm0-2
are not used as source operands later).With the 2nd variant of the code, as far as I can tell the first loop just translates to:
Which is so much cleaner.
Is there anything you could think of that is inhibiting this for
::new()
? Also, is there a better way to initialize from packed memory with safe rust?