axiak / pybloomfiltermmap

Fast Python Bloom Filter using Mmap
http://axiak.github.com/pybloomfiltermmap/
MIT License
741 stars 137 forks source link

Sending serialized bloom filter object over the network and recreating it affects its contents #84

Closed ealione closed 5 years ago

ealione commented 6 years ago

I need to serialize my bloom filter and send it over the network. The way I go about it is as such:

self.bf = pybloomfilter.BloomFilter(_BF_CAPACITY, _BF_ERROR, _BF_FILENAME)
self.bf.add('test')
sbf = msgpack.packb(self.bf.to_base64())
print('temp' in self.bf) # True

I then proceed to sending the sbf bytearray over a tcp connection while on the other end I try to de-serialize it as such.

msg = msgpack.unpackb(msg)
os.unlink(_BF_FILENAME_TMP)
new_bf = self.bf.copy_template(_BF_FILENAME_TMP) # this bf object is similar to the one sent over the net
new_bf.from_base64(_BF_FILENAME_TMP, msg_content, perm=0775)
print('temp' in new_bf) # False

I would expect that the bloom filter received would contain the same items as the one sent.

g1itch commented 5 years ago

It seems you don't need base64 and zlib if you already use msgpack.

import msgpack
import pybloomfilter

def compress(bf):
    bf._assert_open()
    bobj = open(bf.name, 'rb').read()
    return msgpack.dumps(bobj)

def decompress(bp, filepath):
    with open(filepath, 'wb') as ff:
        ff.write(msgpack.loads(bp))
    return pybloomfilter.BloomFilter.open(filepath)

test1 = pybloomfilter.BloomFilter(1000, 0.01, 'test1.bloom')
print('test1: %s' % test1)
test1.update(('foo', 'bar', 'baz'))
print('"foo" in test1: %s' % ('foo' in test1))

test1_wire = compress(test1)
print('test1_wire: %s' % test1_wire)

test2 = decompress(test1_wire, 'test2.bloom')
print('test2: %s' % test2)
print('"foo" in test2: %s' % ('foo' in test2))