Speed up rkyv serialization

This tries to make use of the rkyv CopyOptimize attribute that can guarantee memcpy-able structs are indeed just that, rather then iterating over each element in the vector.

This ran into one oddity however, the archived layout of QbvhNode is smaller then the regular layout (128 vs 124 bytes), this is because QbvhNode contains quite a few padding bytes.

QbvhNodeFlags is u8
NodeIndex has a u8 member

Both of those cause a (normally) 121-bytes struct to get blown up to 128 bytes.

In this PR I've hackfixed/worked around this issue by making QbvhNodeFlags u64 which obviously isn't what anybody would want, however since I'm not familiar with this project I'd like to open the discussion about what to do with this.

It seems likely that QbvhNode needs to be somehow cache-line aligned (which it is at the moment), but it might be nice to mark the padding bytes in NodeIndex and QbvhNode explicitly so it's clear where in the node one can stuff some additional info.

Removing the CopyOptimize is not the end of the world; but in this case having it speeds up rkyv serialization by about 2x.

dimforge / parry

Speed up rkyv serialization #123