[x] Assess xor filtersNotes: The paper is quite confusing. It takes much longer to create a filter.
[ ] Write tutorial on bloomfilter
[ ] Proper indexing
Both issues #211 & #200 seem related to this enhancement, I have a similar problem it that it would be nice to be able to effectively have index columns that can spread across multimple chunks. I my case I have some large datasets where 'sub-shards' would be useful as my groups are too big to practicaly fit in a single chunk. I also have coordinates that it would to be nice to be able to perform a quick check of which chunks the value's i'm trying to look up are in as well as exploiting fst's random access feature to just read out sections of interest based on their indices.
[x] Assess xor filters Notes: The paper is quite confusing. It takes much longer to create a filter.
[ ] Write tutorial on bloomfilter
[ ] Proper indexing
Both issues #211 & #200 seem related to this enhancement, I have a similar problem it that it would be nice to be able to effectively have index columns that can spread across multimple chunks. I my case I have some large datasets where 'sub-shards' would be useful as my groups are too big to practicaly fit in a single chunk. I also have coordinates that it would to be nice to be able to perform a quick check of which chunks the value's i'm trying to look up are in as well as exploiting fst's random access feature to just read out sections of interest based on their indices.
Originally posted by @RichardJActon in https://github.com/xiaodaigh/disk.frame/issues/102#issuecomment-567026839