Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.
I was thinking about rehashing cost in ByteArrayOrdinalMap when running a cycle, and I got curious about whether the hash itself was a bottleneck. So I threw together a quick scratch benchmark (https://github.com/DanielThomas/hollow/tree/murmur3) to see what's what.
The good news is that Yonik Seeley's port used by Hollow HashCodes (hash_hollow) is the fastest I could find (used in Solr), but it appears that the ByteData abstraction, i.e. it's per-byte reads, significantly hurts throughput. The only difference between these benchmarks is one reads bytes using com.netflix.hollow.core.memory.ArrayByteData:
It'd of course be worse again for SegmentedByteArray, where you add a logical shift and bitwise for every com.netflix.hollow.core.memory.ByteData#get.
That got me thinking that there are field read cases like com.netflix.hollow.core.read.engine.object.HollowObjectTypeReadStateShard#readString(com.netflix.hollow.core.memory.ByteData, long, int) that'd likely be impacted too.
So it feels like ByteData abstraction needs to be leakier and start with the a contract that assumes segmented data and provide ways of expose underlying arrays directly, or at least a byte[] contract that handles and copies across the potential spanned array segments.
I was thinking about rehashing cost in
ByteArrayOrdinalMap
when running a cycle, and I got curious about whether the hash itself was a bottleneck. So I threw together a quick scratch benchmark (https://github.com/DanielThomas/hollow/tree/murmur3) to see what's what.The good news is that Yonik Seeley's port used by Hollow
HashCodes
(hash_hollow
) is the fastest I could find (used in Solr), but it appears that theByteData
abstraction, i.e. it's per-byte reads, significantly hurts throughput. The only difference between these benchmarks is one reads bytes usingcom.netflix.hollow.core.memory.ArrayByteData
:It'd of course be worse again for
SegmentedByteArray
, where you add a logical shift and bitwise for everycom.netflix.hollow.core.memory.ByteData#get
.That got me thinking that there are field read cases like
com.netflix.hollow.core.read.engine.object.HollowObjectTypeReadStateShard#readString(com.netflix.hollow.core.memory.ByteData, long, int)
that'd likely be impacted too.So it feels like
ByteData
abstraction needs to be leakier and start with the a contract that assumes segmented data and provide ways of expose underlying arrays directly, or at least abyte[]
contract that handles and copies across the potential spanned array segments.Full Benchmark Results