Lokathor / bytemuck

A crate for mucking around with piles of bytes
https://docs.rs/bytemuck
Apache License 2.0
725 stars 79 forks source link

Suboptimal assembly for Contiguous derive #175

Closed e00E closed 1 year ago

e00E commented 1 year ago

The derived implementation of Contiguous::into_integer prevents some optimizations from happening. See the following example in which I have commented the resulting assembly as retrieved through cargo-show-asm on my stable-x86_64-unknown-linux-gnu machine.

use bytemuck::Contiguous;

#[derive(Copy, Clone, Contiguous)]
#[repr(u8)]
pub enum E {
    E0,
    E1,
    E2,
}

impl E {
    // movzx eax, byte ptr [rdi]
    pub fn ref_as(&self) -> Option<Self> {
        Self::from_integer(*self as u8)
    }

    // movzx eax, byte ptr [rdi]
    pub fn ref_into(&self) -> Option<Self> {
        Self::from_integer(self.into_integer())
    }

    // mov eax, edi
    pub fn val_as(self) -> Option<Self> {
        Self::from_integer(self as u8)
    }

    // cmp dil, 3
    // mov eax, 3
    // cmovb eax, edi
    pub fn val_into(self) -> Option<Self> {
        Self::from_integer(self.into_integer())
    }
}

The assembly for E::val_into should be the same as for E::val_as.

This could be an issue with the default forbidden-to-override implementation of Contiguous::into_integer or it could be an issue with rustc. As a workaround the derive macro could emit as Int since it knows what the repr type is. If you feel this is a rustc issue then I am happy to report it there.

Lokathor commented 1 year ago

I'd certainly be happy to accept a PR which improves the generated assembly of any of the proc-macros.

Jarcho commented 1 year ago

The issue is cause by the transmute in into_integer. Casting to an integer rather than using transmute does allow the optimizer to preserve the range from the enum.

scottmcm commented 1 year ago

Hopefully fixed for Rust 1.71 -- try it out on nightly and let me know if it helped.

e00E commented 1 year ago

Thanks. Will try to remember to test this again when 1.71 releases.

e00E commented 1 year ago

Hopefully fixed for Rust 1.71 -- try it out on nightly and let me know if it helped.

Was not fixed by Rust 1.71. Tried with the stable release today.

scottmcm commented 1 year ago

Oh, I see the problem. When I read

The issue is cause by the transmute in into_integer. Casting to an integer rather than using transmute does allow the optimizer to preserve the range from the enum.

I heard mem::transmute, not transmute!.

With true transmute they now do the same thing (https://rust.godbolt.org/z/Y6PrYbqaW -- so the same that they get merged), but the transmute_copy(&ManuallyDrop dance doesn't get the extra assumes.

So I guess the way forward here is either

  1. Update the derive to override into_integer to an implementation using a cast rather than using the provided impl, as was mentioned in https://github.com/Lokathor/bytemuck/issues/175#issuecomment-1437307067, or
  2. Convince libs-api to expose a stable API wrapper for https://doc.rust-lang.org/nightly/std/intrinsics/fn.transmute_unchecked.html
e00E commented 1 year ago

Thanks for your analysis. I might look into the derive change.

Jarcho commented 1 year ago

That's my fault since I only reported mem::transmute and not mem::transmute_copy. I'll open an issue for that since it should also get the same treatment.

Lokathor commented 1 year ago

aside: now that ptr reading is const fn i hope transmute_copy is const fn soon too