Closed vsraptor closed 2 years ago
This crate provides an FST with an arbitrary byte sequence as a key, and an unsigned 64-bit integer as a value. The data you've given me doesn't appear to fit that format, so I don't know how to answer your question.
If you're asking about posting lists for building an IR index, then yes, they can be done with FSTs and additional structures.
the list of number represent the indexes of sparse binaries say 2000 bits with sparsity of 2%.. so ~40 elements.
then i want to store 1000 to 1mln lists
AND finally given sparse vector i.e. list of indecies , find the top-N closest by OVERLAP i.e. count(A & B)
I still don't understand your question, sorry. You're welcome to try and explain more, but I don't have much time to offer you. So you'll really need to be much clearer.
I think you're best bet is to review the docs for this crate and see whether your data fits.
Sorry :) let me try again..
I represent a binary 100101001 using the indecies of the 1s, so it becomes [1,4,6,9]
Now I want to save millions of list of numbers like the above.
Finally I receive a new binary X. I want to find the closest match to X using OVERLAP as a measure.
FST can do that using Levenstein distance. I simply want to use Overlap instead.
overlap('hello', 'hell') = 4
thanks
If you can express your function as an operation on finite state machines, then sure. That's how Levenshtein works.
let say i store list of list of numbers f.e
1,5,785,890 4,5,22,89 11,5,88,124
searching for 1,22,89,124
will find 4,5,22,89
CAN FST be made to search by overlap, hamming dist, jaccard any of them