Open variux opened 4 years ago
Integrating with redis or other external storage layer is definitely possible. However I would consider the issue of I/O cost with external storage -- sets of original data and posting lists (the data structured used in this library) can be much bigger than MinHash and LSH, so a Python compute layer + Redis/Cassandra storage layer may be inefficient due to large number of I/Os. A more efficient implementation needs to consider the costs, adding a lot of complexity. I do have an algorithm to solve this problem (JOSIE, VLDB 2019, Github), but I haven't had time to write a production-ready library for this.
Is there any possibility of integration using redis or cassandra as already Minhash LSH has?