apache / cassandra-gocql-driver

GoCQL Driver for Apache Cassandra®
https://cassandra.apache.org/
Apache License 2.0
2.57k stars 616 forks source link

Duplicated inserts when clustering key is type of frozen map #1549

Open vysu0216 opened 3 years ago

vysu0216 commented 3 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Cassandra are you using?

[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]

What version of Gocql are you using?

v0.0.0-20200131111108-92af2e088537

What version of Go are you using?

1.15

What did you do?

Table schema: timeseries ( name text, labels frozen<map<text, text>>, PRIMARY KEY (name, labels) ) WITH CLUSTERING ORDER BY (labels ASC); Labels is clustering key. When labels have type of frozen map, multiple insertion of some rows secondary leads to duplicates. Not all data is duplicated, it is not happens always. For example: insert into dup_test.nc_pm_timeseries(name,labels) VALUES('node_memory_HugePages_Surp', {'instance': '10.109.26.7:9100', 'job': 'node-exporter'}); and insert into dup_test.nc_pm_timeseries(name,labels) VALUES('node_disk_read_bytes_total', {'device': 'vda', 'instance': '10.109.26.7:9100', 'job': 'node-exporter'}); Field type in go 'map[string]string'. When labels have type of text duplicates are prevented.

What did you expect to see?

Insert data work as upsert for such primary key and duplicates are prevented for same row multiple insertion.

What did you see instead?

Rows are duplicated for same row multiple insertion.

martin-sucha commented 3 years ago

This is probably caused because a map does not have specified iteration order (neither in Go nor the CQL protocol), so you can end up with different binary representations of the map.

We don't sort map key as that would incur performance penalty. Other drivers (e.g. Datastax java and python) don't sort map keys either.

I would expect the same can happen when you use e.g. cqlsh. I tried looking up Cassandra docs whether using frozen map in a partition key is supported, but couldn't find anything.

If you want to use a map in a partition key, the safest seems to be to serialize it manually (you can implement gocql.Marshaler so that you always use a canonical representation (i.e. sort keys). I would even use blob or text type instead of frozen map in the database so that you always know that you need to serialize the labels manually in sorted order.