As engineers, we already know that simple tasks become harder at scale. A category of problems that become slow and resource intensive when we deal with "big data" are the ones related with counting things. In this talk we will discuss some counting related problems like distinct counting (how many different elements do we have in a set/stream) or top-k (elements with the most repetitions in a set/stream)
I will explain how we can solve those problems with probabilistic data structures and algorithms like Bloom/Cuckoo filters (set membership), HyperLogLog and KMV (cardinality estimation) and Count-Min Sketch (top-k most frequent values).
I will also give some intuition of how those algorithms work under the hood and where we can take advantage of them (databases, stream processing engines, microchips, analytic engines...)
type: standard
level: medium
language: english
twitter: @positiveblue2
email: jordi.montes.sanabria@gmail.com"
[x] I have read and agree the Talk standards
Learning to count at scale in Golang
As engineers, we already know that simple tasks become harder at scale. A category of problems that become slow and resource intensive when we deal with "big data" are the ones related with counting things. In this talk we will discuss some counting related problems like distinct counting (how many different elements do we have in a set/stream) or top-k (elements with the most repetitions in a set/stream)
I will explain how we can solve those problems with probabilistic data structures and algorithms like Bloom/Cuckoo filters (set membership), HyperLogLog and KMV (cardinality estimation) and Count-Min Sketch (top-k most frequent values). I will also give some intuition of how those algorithms work under the hood and where we can take advantage of them (databases, stream processing engines, microchips, analytic engines...)