aboutcode-org / ai-gen-code-search

A set of utilities and tools to detect and search AI-generated code
1 stars 0 forks source link

AI-GCS: Implement efficient hamming distance fingerprint matching, e.g., the core search #5

Open pombredanne opened 1 month ago

pombredanne commented 1 month ago

This is to implement efficient search and matching of fingerprints in hamming distance and should implement the selected design with a basic search ranking procedure as efficient search and matching engine code.

JonoYang commented 3 weeks ago

We have implemented the halohash algorithm for approximate file matching at https://github.com/aboutcode-org/matchcode-toolkit/blob/main/src/matchcode_toolkit/halohash.py

This is the core hashing algorithm that we use to create the file fingerprint halo1, the snippet hashes, and the directory matching hashes.

JonoYang commented 2 weeks ago

This is available at https://pypi.org/project/matchcode-toolkit/