Lokathor / bytemuck

A crate for mucking around with piles of bytes
https://docs.rs/bytemuck
Apache License 2.0
697 stars 77 forks source link

`ASSERT_SIZE_MULTIPLE_OF` when casting `&[u8]` to `&[[u8; 16]]` #213

Closed SK83RJOSH closed 9 months ago

SK83RJOSH commented 10 months ago

I'm writing a tool to swizzle texture data offline. For convenience I rely on must_cast_slice to do most of the work, and I've unfortunately ran into an edge case it seems.

I expect the following to satisfy all requirements for a "safe" cast:

let src: Vec<u8> = texture.data.clone();
let src_blocks: &[[u8; 16]] = bytemuck::must_cast_slice(&src);

As the alignment of u8 and [u8; 16] are both one, and the resulting cast should handle slop correctly:

let src = [0_u8; 1024]; // size = 1024 (0x400), align = 0x1
let src_block: [u8; 16]; // size = 16 (0x10), align = 0x1

Is there a practical reason for disallowing this cast, or is it mostly to simplify the cast implementation?

zachs18 commented 10 months ago

A [u8] may have a length that is not a multiple of 16, in which case bytemuck::try_cast_slice::<[u8], [[u8; 16]]> would return a PodCastError::OutputSliceWouldHaveSlop, and bytemuck::cast_slice would panic. bytemuck::must_cast_slice does not compile with a cast that could cause an error at runtime, so bytemuck::must_cast_slice::<[u8], [[u8; 16]]> does not compile.

the resulting cast should handle slop correctly

bytemuck normally "handles" slop by not allowing it at all, so this error is consistent with that.


If you want a version that ignores slop and cannot fail at runtime, you can slice the source slice to have a length that is a multiple of 16 before passing it to bytemuck::cast_slice. (The optimizer will likely be able to see that the cast cannot fail and remove the panic branch from bytemuck::cast_slice.)

let src = texture.data.clone();
let no_slop_len = (src.len() / 16) * 16; // round down to multiple of 16
let src_blocks: &[[u8; 16]] = bytemuck::cast_slice(&src[..no_slop_len]);

On my machine, a similar function compiles down to almost nothing in release mode (just register shuffling and a right shift of the slice length)


pub fn chunks_16(texture_data: &[u8]) -> &[[u8; 16]] {
    let src = texture_data;
    let no_slop_len = (src.len() / 16) * 16; // round down to multiple of 16
    let src_blocks: &[[u8; 16]] = bytemuck::cast_slice(&src[..no_slop_len]);
    src_blocks
}

(using cargo-show-asm)

bytemuck_cast_slice_runtime_infallible::chunks_16:

    .cfi_startproc
    mov rdx, rsi
    mov rax, rdi

    shr rdx, 4

    ret