Closed Alisaincd closed 7 years ago
Thanks for raising the issue. I just fixed it in 1.2.1.
For your question. MinHash supports bytes as input. So as long as you can convert the object (i.e., integers, strings, floats, lists) into bytes, it works with MinHash. For example:
# For a set of floats, e.g. {1.3, 123.4, 32.9, 3.1415926, ...}
minhash.update(struct.pack("f", 3.1415926))
# EVERY ELEMENT in your input set is a LIST of float
# e.g. {[1.34, 1.3, 343.0, 123.9], [2.3, 23.2, 86.8], ...}
minhash.update(struct.pack("4f", *[1.34, 1.3, 343.0, 123.9]))
Hi, I want to got the top k element with MinHashLSH but failed. For example, I set 'k=3', but I got ('result: ', ['21', '28', '51', '1', '82', '3', '91', '69', '86', '85']), whose length is larger than 3. My demo is like below: def query_topk(l, query_doc, k): forest= MinHashLSHForest(num_perm=256) count=0 for i in l: forest.add(str(count), i) count += 1 forest.index() result = forest.query(query_doc, k) return result
l : list of MinHash, query_doc: a MinHash Is there anything wrong? By the way, does the input must be a list of string? What if my input is a vector? Thanks for your patience, And another question, does this realization just support for texts? if each of my input is a list of float, i.e.[[1,2,3],[1.2,2.3,2.1]], can this work perfectly?
Sincerely,