Typically, the IncidenceBitmaps compression scheme would be used when there are very few input values (beyond 32 distinct values there's no longer any storage space benefit, although if one pre-filters and sends just a subset of the values the scheme could be beneficial in other contexts). But even that is not very efficient, and one can expect this scheme to be used a lot with 2,3,4,5 value cases.
The case of 2 values is a special case, in which a single bitmap would be enough, so let's put it aside. But in the other cases of a small number of bitmaps, the dictionary is also very small. So small, in fact, that instead of looking up in a shared-memory copy of it, one might be better served be a case statement over these values, or an if-else chain, which might incur predication, but might still be faster than going to shared memory.
In these cases, and with the uncompressed data being not-too-large, it might be possible to fit the entire dictionary into each single thread's
Typically, the IncidenceBitmaps compression scheme would be used when there are very few input values (beyond 32 distinct values there's no longer any storage space benefit, although if one pre-filters and sends just a subset of the values the scheme could be beneficial in other contexts). But even that is not very efficient, and one can expect this scheme to be used a lot with 2,3,4,5 value cases.
The case of 2 values is a special case, in which a single bitmap would be enough, so let's put it aside. But in the other cases of a small number of bitmaps, the dictionary is also very small. So small, in fact, that instead of looking up in a shared-memory copy of it, one might be better served be a case statement over these values, or an if-else chain, which might incur predication, but might still be faster than going to shared memory.
In these cases, and with the uncompressed data being not-too-large, it might be possible to fit the entire dictionary into each single thread's