Open pombredanne opened 1 month ago
We have implemented the halohash algorithm for approximate file matching at https://github.com/aboutcode-org/matchcode-toolkit/blob/main/src/matchcode_toolkit/halohash.py
This is the core hashing algorithm that we use to create the file fingerprint halo1
, the snippet hashes, and the directory matching hashes.
This is available at https://pypi.org/project/matchcode-toolkit/
This is to implement efficient search and matching of fingerprints in hamming distance and should implement the selected design with a basic search ranking procedure as efficient search and matching engine code.