simhash Search Results - Githubissues

422 results
for simhash

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

yanyiwu/simhash_server #3

请教个使用问题

这样计算出来每篇内容的simhash值，然后进行比对么？我的网站是PHP的，服务器上已经安装了，怎么计算两个值的相似度呢？请问有PHP的计算的代码么？

hjy2588818 updated 5 years ago
2
googleprojectzero/functionsimsearch #10

Many of the tests are brittle when adding new features to th…

This was a desired feature initially to make sure the underlying disassembly is good, but makes tweaking / improving / adding new features difficult without breaking the tests. Particular culpri…

thomasdullien updated 6 years ago
1
TextDatasetCleaner/deduplicator #1

Аналитика

Прежде чем начать работу - нужно провести рисёрч инструментов/библиотек для удаления нечётких дубликатов строк, которые уже кем-то написаны. И полезно будет сразу же почитать о видах хешей (MinHash, S…

saippuakauppias updated 3 years ago
6
bigcode-project/bigcode-analysis #8

[Exact Substring Deduplication] Analysis

Near deduplication #7 only operates on file level. It is also possible for a file to be 1. a substring of another file, while the minhash/simhash fingerprints being wildly different 2. composed o…

ChenghaoMou updated 2 years ago
1
ooni/backend #282

Write tooling and docs on how to add blockpage fingerprints

We currently have a bunch of known blockpages that should be added to the pipeline (see: `fingerprintdb` label). We should have tooling and documentation on how to add a blockpage fingerprint to th…

hellais updated 3 years ago
3
zyymax/text-similarity #2

simhash_imp.py报错： ValueError: could not convert string to fl…

Traceback (most recent call last): File "src/simhash_imp.py", line 191, in feature_vec = [(int(item.split(':')[0]),float(item.split(':')[1])) for item in feature_vec] ValueError: could not conv…

cxzhp updated 5 years ago
6
go2starr/lshhdc #1

Some questions

Hi! Thank you for this code, I've been studying it thoroughly and it is a very useful and helpful companion to the theory and algorithm sketches found in the MMD book. I have a few questions about som…

cjauvin updated 10 years ago
4
shibing624/similarity #39

计算段落SimHash不管设置多少位，结果都只有42位有效值，后面全部是0

断点打到相似度计算中间发现的，simHash的每一个字符计算，最大位数也就只有42位，向量计算也就只有前42位有效，可能需要更换一下hash算法？

ryumiyax updated 1 year ago
3
seomoz/simhash-py #52

Python 3 compatible PyPI release

Thanks for this project! Is it time for a PyPI release? The current published version isn't compatible with Python 3, but the github version is working for me (tested `compute()` running on Python 3.7…

jcushman updated 3 years ago
1
BartMassey/simhash #1

Could you please clarify how the algorithm works?

The manpage says: > The algorithm used by simhash is Manassas' "shingleprinting" algorithm (see BIBLIOGRAPHY below): take a hash of every m-byte subsequence of the file, and retain the n of these h…

chmduquesne updated 6 years ago
1

上一页 1...2 3 4 5 6 7 8...43 下一页

422 results for simhash

422 results
for simhash