citusdata / postgresql-hll

PostgreSQL extension adding HyperLogLog data structures as a native data type
http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/
Apache License 2.0
1.12k stars 116 forks source link

"could not identify an equality operator for type hll" in GROUP BY #80

Open onlined opened 5 years ago

onlined commented 5 years ago

When I use a column with hll type as a field in GROUP BY clause, I get the error:

ERROR:  could not identify an equality operator for type hll

Here's a small repro:

CREATE TABLE tmp (int num, set hll);
INSERT INTO tmp VALUES (1, hll_empty());
INSERT INTO tmp VALUES (2, hll_empty());
INSERT INTO tmp VALUES (3, hll_empty());
INSERT INTO tmp VALUES (4, hll_empty());
SELECT SUM(num), set FROM tmp GROUP BY set;

While I expect it to result in a row consisting of 10 and empty hll value, I get the error above.

postgresql-hll Version: master (77aa0fe) PostgreSQL Version: 11.5 OS Version: Debian 10

metdos commented 5 years ago

@onlined is there an equality operator for type hll, this error message would be expected?

onlined commented 5 years ago

Yes, there is. When I run \do, I get the following output:

                                List of operators
 Schema | Name | Left arg type | Right arg type |   Result type    | Description
--------+------+---------------+----------------+------------------+-------------
 public | #    |               | hll            | double precision |
 public | <>   | hll           | hll            | boolean          |
 public | <>   | hll_hashval   | hll_hashval    | boolean          |
 public | =    | hll           | hll            | boolean          |
 public | =    | hll_hashval   | hll_hashval    | boolean          |
 public | ||   | hll           | hll            | hll              |
 public | ||   | hll           | hll_hashval    | hll              |
 public | ||   | hll_hashval   | hll            | hll              |
(8 rows)
davidalejandroaguilar commented 3 years ago

We're having the same problem when trying to UNION two tables with hll columns. e.g.

CREATE TABLE test1 ( id integer, hashed hll);
CREATE TABLE test2 ( id integer, hashed hll);
SELECT hashed FROM test1 UNION SELECT hashed FROM test2; 

postgresql-hll Version: v2.15.1 PostgreSQL Version: 12 OS Version: macOS Catalina 10.15.7

Sieboldianus commented 3 years ago
SELECT hll_union_agg(hashed) FROM test1 UNION SELECT hll_union_agg(hashed) FROM test2; 

hashed in both queries is a list of records (even though only one may exist). Therefore, it is necessary to apply an aggregate operation first (hll_union_agg()).

This works for me.