Found this after looking for the long tail of slow transfers, finding out that adding {active,N} batch buffering didn't help much and a packet capture still showed filled TCP window. Noting that the process memory never increased (due to using large refc binaries which go into progressively larger single block carrier memory pools I assume) and the number of reductions slowed down, I assumed the issue could be the allocation of single block carriers.
So the alternative buffering mechanism keeps all binaries received into a list and sums up their counts, and only merges them as one big binary to deserialize once the required size is reached. This lets us use the process's heap memory, which is in a multiblock carrier and what I figure is also much faster to put memory into. The same file transfer is now taking a fraction of the old one; the slowest receive which was at 31ms is now at 0.645ms
old stats
new stats
We can see the new run taking even more packets but that doesn't matter, because they're just processed faster.
old pattern:
new pattern:
Found this after looking for the long tail of slow transfers, finding out that adding
{active,N}
batch buffering didn't help much and a packet capture still showed filled TCP window. Noting that the process memory never increased (due to using large refc binaries which go into progressively larger single block carrier memory pools I assume) and the number of reductions slowed down, I assumed the issue could be the allocation of single block carriers.So the alternative buffering mechanism keeps all binaries received into a list and sums up their counts, and only merges them as one big binary to deserialize once the required size is reached. This lets us use the process's heap memory, which is in a multiblock carrier and what I figure is also much faster to put memory into. The same file transfer is now taking a fraction of the old one; the slowest receive which was at 31ms is now at 0.645ms
old stats
new stats
We can see the new run taking even more packets but that doesn't matter, because they're just processed faster.