Profiling showed that a major hotspot was checking whether a tuple was already in our DB and adding a tuple to the DB, because both operations involved working with Java's ConcurrentSkipListSet (used to represent indices). This PR adds a hash set that acts as a filter before any skip list is accessed. It leads to substantial speedups, especially in cases where many duplicate tuples are derived and tested against the DB.
Profiling showed that a major hotspot was checking whether a tuple was already in our DB and adding a tuple to the DB, because both operations involved working with Java's
ConcurrentSkipListSet
(used to represent indices). This PR adds a hash set that acts as a filter before any skip list is accessed. It leads to substantial speedups, especially in cases where many duplicate tuples are derived and tested against the DB.