citusdata / postgresql-hll

PostgreSQL extension adding HyperLogLog data structures as a native data type
http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/
Apache License 2.0
1.14k stars 116 forks source link

Q: Evaluation cost of hll_empty() #157

Open samkillin opened 10 months ago

samkillin commented 10 months ago

I am looking to "upsert" hll's in my database, using a traditional two stage query:

# Add an empty HLL if one doesn't exist
INSERT INTO counts(event_type, reference, count) 
VALUES(:event_type, :reference, hll_empty()) 
ON CONFLICT DO NOTHING; 

# Add to the HLL
UPDATE counts 
SET count = hll_add(count, hll_hash_text(:event_id)) 
WHERE event_type = :event_type AND reference = :reference;

Given hll_empty() will be invoked on every query, even in the update case (i.e. before a conflict is identified), how cheap/expensive is that function to execute? The README mentions use of a sentinel value – is this sentinel value recalculated every call?

Should I be concerned with the evaluation of that function being a potential bottleneck compared to hll_add and hll_hash_text?

emelsimsek commented 10 months ago

Looking at the code, hll_empty() in fact creates an empty set (internally represented as a byte array) at every call. So it does not use a precalculated sentinel.