In order to improve rx performance we need to reduce the amount of copies that are made. Defragmentation had 3 copies, it was reduced to 1. It could have been reduced to 0 but it would have made the PR much heavier and it would have slowed down non-fragmented data, which is still the main usecase.
There's still an uplift of 25% less time spent recopying data on a 150kB payload in our throughput test.
Now fragment payload are aliased, then recopied in the defragmentation tx_buffer with is then cheaply converted to a rx_buffer before being decoded (vs copy, copy and copy). User can take ownership of this buffer.
Added a state to allocate the buffer only when needed and to mark when it has overflown.
In order to improve rx performance we need to reduce the amount of copies that are made. Defragmentation had 3 copies, it was reduced to 1. It could have been reduced to 0 but it would have made the PR much heavier and it would have slowed down non-fragmented data, which is still the main usecase.
There's still an uplift of 25% less time spent recopying data on a 150kB payload in our throughput test.
Now fragment payload are aliased, then recopied in the defragmentation tx_buffer with is then cheaply converted to a rx_buffer before being decoded (vs copy, copy and copy). User can take ownership of this buffer.
Added a state to allocate the buffer only when needed and to mark when it has overflown.
Fixed some UBs and memory leaks.