Closed yunfan123 closed 4 years ago
This is not correct. Distinct counting sketches will count a value only once. You added it and then removed it, so the value will not be present after the AnotB operation.
Without additional context, these seem to be theta sketches. Those will count each item once, which is why they're distinct counting sketches or cardinality estimation sketches: They tell you how many different elements exist, but nothing about how often any specific element exists. There something of an overview here: http://datasketches.apache.org/docs/Architecture/MajorSketchFamilies.html
There is a Tuple sketch that can hold additional data associated with each element, but there's no default behavior for how those combine. You'd need to define your own summaries to try to create the behavior you want. Keeping in mind that these are probabilistic structure and that you can't expect exact results -- not even for individual elements -- as the data volume grows.
All of update sketch don't deal with duplicate datas. So in such situation:
Int this cases, because the number 1 add two times and only remove once, so it should counts in the final result. But actually it will not. It will cause very large error rate in small data.