This is optimized for simplicity of implementation, not speed.
If you did want to optimize the hex-printer for speed, you would expand 16 bytes into 32 nibbles, then coerce that to 32 bytes. Currently, LLVM does not know how to do @as(@Vector(32, u8), @as(@Vector(32, u4), @bitCast(x))) efficiently (https://github.com/llvm/llvm-project/issues/79094). Here is the workaround. Then you would do a vpshufb on x86/tbl on ARM to lookup each nibble into a table of { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' }. Unfortunately my cross-platform implementation that handles vector shuffles got deleted by accident, so I won't provide code for that here today. Then you could use a vector permute or vector expansion to provide enough space for the dashes, and merge them in. Let me know if you would prefer a version optimized for speed.
This is optimized for simplicity of implementation, not speed.
If you did want to optimize the hex-printer for speed, you would expand 16 bytes into 32 nibbles, then coerce that to 32 bytes. Currently, LLVM does not know how to do
@as(@Vector(32, u8), @as(@Vector(32, u4), @bitCast(x)))
efficiently (https://github.com/llvm/llvm-project/issues/79094). Here is the workaround. Then you would do avpshufb
on x86/tbl
on ARM to lookup each nibble into a table of{ '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' }
. Unfortunately my cross-platform implementation that handles vector shuffles got deleted by accident, so I won't provide code for that here today. Then you could use a vector permute or vector expansion to provide enough space for the dashes, and merge them in. Let me know if you would prefer a version optimized for speed.Fixes https://github.com/coolaj86/zig-uuidv7/issues/3