When building the bloom filter, we used to use the results coming from sqlalchemy directly.
Turns out that these objects are of type sqlalchemy.engine.result.RowProxy, which for long values truncate the data inside them when converted to string using str (which is exactly what pybloom does before hashing).
This causes wrong data to be stored in the bloom filter, causing inserts for existing rows.
When building the bloom filter, we used to use the results coming from sqlalchemy directly. Turns out that these objects are of type
sqlalchemy.engine.result.RowProxy
, which for long values truncate the data inside them when converted to string usingstr
(which is exactly whatpybloom
does before hashing). This causes wrong data to be stored in the bloom filter, causing inserts for existing rows.