Open anhldbk opened 6 years ago
I guess it's a feature request more than anything else... but I concur; it'd be super-convenient if it was part of the API
I used some dummy structs and manual memcpy to do so in a project a while back. It's gross and bad form but works. Just make sure the layout's the same. (Until it's actually supported, of course.)
Would you mind sharing the code ?
I can pull that out of a private repo. I’ll look it over tomorrow and see if a pull request is easier.
If you’re doing sequence analysis, which I’m guessing based on your name, I could release the kmer database code around it if it meets your needs.
Thanks!
No, this time it's not for sequence analysis :)
👍 to this! Serializing/deserializing seems more efficient than rebuilding the filter. @dnbaker hope it is possible to share the code.
@dnbaker, @efficient
this is viable, yes? if so, something to point me to?
assuming a surgically placed memcpy, but not feeling confident whereabout
@dnbaker resurrecting an old thread :)
I'm also interested in serialization. Any chance you can dig up that old code, or do you have recommendations for another cuckoo filter library?
Here's the code I had. It's only valid for POD (std::is_trivial and std::is_standard_layout). Sorry I took a while to respond - I had to go find and extract it.
First, there's a couple of dummy structs which are defined, and namespacing:
namespace {
typedef struct {
size_t index;
uint32_t tag;
bool used;
} SerialVictim;
struct dummytable {
void *p;
size_t n;
};
struct dummyfilter {
void *p;
size_t n;
SerialVictim vc;
};
}
using namespace cuckoofilter;
Writing to disk:
size_t write_cuckoo_filter(const CuckooFilter<uint64_t, bits_per_item, SingleTable> *filter,
const char *path) {
dummyfilter dfilter;
dummytable tbl;
static const size_t tags_per_bucket(4);
const size_t bktsz((bits_per_item * tags_per_bucket + 7) >> 3); // I haz a bkt.
static_assert(sizeof(dfilter) == sizeof(*filter), "Dummy table is the wrong size!\n");
static_assert(sizeof(tbl) == sizeof(SingleTable<bits_per_item>), "Dummy table is the wrong size!\n");
// Copy info to dummy structs.
memcpy(&dfilter, filter, sizeof(*filter));
memcpy(&tbl, dfilter.p, sizeof(tbl));
FILE *fp(fopen(path, "wb"));
const int used(dfilter.vc.used);
size_t ret(fwrite(&dfilter.n, 1, sizeof(dfilter.n), fp));
ret += fwrite(&dfilter.vc.index, 1, sizeof(dfilter.vc.index), fp);
ret += fwrite(&dfilter.vc.tag, 1, sizeof(dfilter.vc.tag), fp);
ret += fwrite(&used, 1, sizeof(used), fp);
ret += fwrite(&tbl.n, 1, sizeof(tbl.n), fp);
ret += fwrite(tbl.p, 1, bktsz * tbl.n, fp);
fclose(fp);
return ret;
}
Reading:
template <size_t bits_per_item>
CuckooFilter<uint64_t, bits_per_item, SingleTable> *
load_cuckoo_filter(const char *path) {
FILE *fp(fopen(path, "rb"));
const size_t tags_per_bucket(4uL);
const size_t bktsz((bits_per_item * tags_per_bucket + 7) >> 3); // I haz a bkt.
fseek(fp, 0, SEEK_END);
const size_t fsize = ftell(fp);
fseek(fp, 0, SEEK_SET);
size_t num_bktz, num_items, index;
uint32_t tag;
int used;
fscanf(fp, "%llu", &num_items);
fscanf(fp, "%llu", &index);
fscanf(fp, "%u", &tag);
fscanf(fp, "%i", &used);
fscanf(fp, "%llu", &num_bktz);
size_t num_bktz_found((fsize - (sizeof(size_t) * 3) - sizeof(uint32_t) - sizeof(int)) / bktsz);
if((fsize - (sizeof(size_t) * 3) - sizeof(uint32_t) - sizeof(int)) % bktsz) {
fprintf(stderr, "Found %" PRIu64 " buckets. Expected %" PRIu64 ". fsize: %" PRIu64 ". bktsz: %" PRIu64 ". They be stealin' my bkt :'(....\n",
num_bktz_found, num_bktz, fsize, bktsz);
exit(EXIT_FAILURE);
}
void *buf = (uint8_t *)malloc(fsize - 1);
size_t nread;
if((nread = fread(buf, 1, num_bktz * bktsz, fp)) != num_bktz * bktsz) {
fprintf(stderr, "Couldn't read buffer from file '%s' Bytes read: %" PRIu64 ". Expected: %" PRIu64 ".\n", path, nread, num_bktz * bktsz);
exit(EXIT_FAILURE);
}
dummytable *table((dummytable *)malloc(sizeof(SingleTable<bits_per_item>)));
*table = {buf, num_bktz};
static_assert(sizeof(*table) == sizeof(uint64_t[2]), "Wrong size!\n");
dummyfilter *ret((dummyfilter *)malloc(sizeof(dummyfilter)));
*ret = {table, num_items, {index, tag, !!used}};
fclose(fp);
return (CuckooFilter<uint64_t, bits_per_item, SingleTable> *)(ret);
}
At destruction, you'll need to manually call your destructor (since it's a pointer), and std::free(ret->table)
.
It's pretty clunky, but it at least worked for my purposes. Let me know if you have any problems or need some help!
Thanks for this nice implementation of Cuckoo filter. Would you please let me know how to serialize/deserialize the filter's state?