beetbox / pyacoustid

Python bindings for Chromaprint acoustic fingerprinting and the Acoustid Web service
MIT License
334 stars 66 forks source link

Expose chromaprint_hash_fingerprint() #59

Closed redapple closed 4 years ago

redapple commented 4 years ago

I've been playing with chromaprint fingerprint simhashes recently, and thought it could be nice to have them in pyacoustid.

chromaprint_get_fingerprint_hash() was added in chromaprint Version 1.3 -- February 2, 2016.

I'm only exposing chromaprint_hash_fingerprint() as it's what I use myself:

/**
 * Generate a single 32-bit hash for a raw fingerprint.
 *
 * If two fingerprints are similar, their hashes generated by this function
 * will also be similar. If they are significantly different, their hashes
 * will most likely be significantly different as well, but you can't rely
 * on that.
 *
 * You compare two hashes by counting the bits in which they differ. Normally
 * that would be something like POPCNT(hash1 XOR hash2), which returns a
 * number between 0 and 32. Anthing above 15 means the hashes are
 * completely different.
 *
 * @param[in] fp pointer to an array of 32-bit integers representing the raw
 *        fingerprint to be hashed
 * @param[in] size number of items in the raw fingerprint
 * @param[out] hash pointer to a 32-bit integer where the hash will be stored
 *
 * @return 0 on error, 1 on success
 */
CHROMAPRINT_API int chromaprint_hash_fingerprint(const uint32_t *fp, int size, uint32_t *hash);

Usage

>>> import chromaprint
>>> fgp = "AQADtFkoaUmz4JNmfD2M6_gxSjv-QjW6C30W5CFCRil-R9CVH1lONMoDblG-4YvQnCdK__iKZ8GpLBdyQ_yRBx-1Cc3RO0xw8WhELUflgG-QB-dVhHGOb6AyHn6P8vLQL0eaHw_E2mDOIA-O_BwcaR--7IOqI3fgNdiP34ox_siTHFq07fjRr_iDm3iW4JESBfrg41mOh1dwJw_4EVoqKUef4SiPK9EDbduRk-Ceo_E6fKqSoHmOi5mRfBbCJKrQ83gJnseHVD_0yDiPKdnjoMfVG6IZPM2FxjkL7UFOXD2uzBTeHBea_OiPkGTQkAqe2FB_QXl03DvO7gwaG18-QOx2bI2Pfi800oSPh6rw4C_KXDiT4IKKJ7jCCV3y4PDzomfwH3cSoNIipYj24LdwyciZJMCLaiP-4D2yZAyPHk_A_ygp5fAzH-V6_HiE80h-pOeFD001Fn3j4jbRZxka-Xh50PmQT0IvHR9BJzrUB82PNza68oT546nh0EEPGjp-RXAYfcEn4U5Q_Zjy4z_0p3iStUXM_OgDnz2KTziJXEkSQdaMMOhDHGs-_E-C32BSKUL4BMdzcLpxP_gz5D36XAjdTnhmnJ9Q3fA1Q8q05PgPPcV_-MSPX7Fg5jL2CCI_3Du6MIeWXPgUkvgVNES-pETzo3KeYIpo4lRy5ISf5Lgz4juHP8QZJkaTB__wYz-0JlqM6ySaY3p2PNHwI3zAPPhuhIcfXFIGUYmiB2eTGmGOC8-R51FxKYtCQbtg2gh1OvhP_Mhj6LBa5EkwJbuPfmk0lEv2oAmPPUdeODiFK8fj459G8HCoQ85EnDp8HbGzBJ8eRH3WBBcTnFEUaN-LlEd_fEcdG42iXbiPf7NRPvhYVAsVXGIQczl0BKhNmEmOH-GaBboRSxqeo1qwHmnEKIEuHUd55CdaZkeYiMuLF1ouIVdbvDuaLDIuH_2SIlUTQdNxHXmO_IdjBvIW4doRJqLx6DCDH_EciImjdMjRBj96hKlT_IEWrajMo0-io6mCm0c_DcqS5ugPRbZh9OiHNNmOynzxHEefI-Yo-IOaRke6oxZT5Ep-HE3m-Oi8HJqOjcUHXYngOMqRD6k8PBCXoFTwILx2ND9-Y26E-6h6fMH1IFegJj1yhM3R74JbQZP14Q2anIe-wzHXImcOSssiVM9x4sijC1p2hHKOl3inVJhy5YF39BF2JUfYQ00jpLFu9E9wfsiFZmsO7UScKcmNaj9caYetoFaG8BCJUPFy_Me7IG9yvNBMIcyZ43mGR89woj8esciSOMcD3TL-4Djz4GsF_cGJVOZxBlcO1M6RJmlk4T_0IkepC_mOJtkETWKP0NKJH3_w46SQpxK0hNSFHPiG1kGYTVGMQ0bIRMF7hBauhBG0yElQ9Tie47gzD6IexE6CqzRu4FcQbiKabEmgHMiRG31aNMkWfdB3VEd6wRPyCGeTQ5R3lMdxHU0LTbWPb0iX6hXGE-dvPImOLBaeHGoqIQ_-I79g5qgsGV-UHFp09AfjQVSUCP1QeQn8HB-aiYee4IyOSImZo7uO-zhzPEFTBSV1hE8gRkx0dEW8IM_R6_ATaE9QbS8sfCdmEqKm5HC6JAVz7nhOPNB3Bf4RO8fZHP-DZpUjXDCjo48RXoQYJ0Oo9MHf4ceN1A2eiVCd3AgfFj5HuMKTKEf6QTvxNDGHeE9wIeeP59Bhrkd-Bn-K84iLpq5Qsoee2XiKA5e2HL9xPegeNM9wPFHS41Fy9HJW-EqO_0G45DJKPYQboTuy5OgT6Ee-XPiX4DKq_HAV9CF-ID8-HXEa4d_R8EMmJxtKbYF2YTuqS8fxJCd0GU8uhBaTonli4jrOHvtNhPfhHxUVQicr3OALHKp-PI_ww092jB50Jkee6HgeNJdS9FGuCZXSw7vwPcfT48uQLGqCnMN34vmwP_h9VOQIytqEIsehv_gYyJ2H6rrg89goaF6OkGieB_-Q58N_6E-RR9GH5ksm1Epk4bmT4RiP-kjTQ_NGsE_x4ykOxR2-p8IePOThSYly6EY-fTiairHRR7lwtymeHD-eHOGmQjtipSbx6Og_wTn8GF_Sg4oTlDKO_FD14g3UDbU7I1qV7Egd1EqOa0aTGVXMGBeHozaaJszxIMxl3OEFq5i84_0QikcD2SRuGXx6HGcevFDXByV3IcxD3MgTQZN0I33EFY-M53D0MCiTo2PyEK6O6kiOHBc_fHkC8kMTM6iVHTkQ08a00kcfnBqOWtLxZCRCORd0dkejYpJDXD1uaMzHFD2P-PhzNGOPPsfjLcKO30GY7IfGElcTPPiPx5gZ4soDmYf7I34ePMIz9Sh1ZEq1-Pg7VEp86FfwF82heQtORca0H-kvFN_xPcg7HNMPG3_gE1WNy3hzFVpUpkOPeNJxo9R4DdtR9XgkZseXIUw-JBHtFVdofBmcR8cTa3hmNNJzVE2OB_mP6vBIsTgP9xWYc3gOLSON_0TMo1TsIt_R49CifrhyxMw6HH5ynEWeE7-YBPpmHU2YI1Q8HMdP4WHyIE8YPIH2I0z24NxTPKXAiXqQh_B36DPy4ML9eMibQ71SIzWPe0Pto_ny4w5C-ocWacedHZe-4CwhRlZa5PnwxegpNJdxHY5EZTSeIo8H8cF7_IIeR8KjFn_RTV_QxCEevAcGiBGUGAQAUEQQIoQgRAhBgABAEgORAcA4IJFgCAIgiDACCUUIIgI4AgERwAikgKhMASQEYkAYIAgTCgRDlEEIAYAIoMAAAxAjwAIAABCKCCAECEA5CowDAgCAjAKEAIYQEYQhxAgAAgEmACFMMOKEYeIJBBAgBAjjiCJCOACEBQR4IRBEQAlhCCKAGSCkMEIBI4AQiAggTBDKIQAINMAT7wQBBikiGAEEGASEE0Qoh4BQSADAECJOEGMAEUYBAyQRxkCnnCCKCSEhwsoBi4AAiDFkEBFAAGABgUgAIYAhwBFqHDFKAEScIQIBIiRBwBJmABBaIJGAA8oSQsBAAAEBEAFACEqgYAgJBBWBAiykAACAAGKIQMAQwBQCjBAoACGDUQSIE00AI4RCQCEAHAIbCCEYUERQQEAbBBIBhIBAEGUEYkgYQQABGABDpGEKAIYAAMAaJhgQiCADHABIOSYFEF4hhIzBShkOAHAFEAoxIUwEoggQ3BtAEKQAAQGcAQ4IAAAACkimEAMIQcOkQUQQcAQBCACgiDIEEYMNMII5QYRhjAACjBAICGOgQMoAxDwwSCBgBHDQCIEkRwwwBARBEAJgAFFUSEKEMUIJYJgyQABNCBPQMIC4UYYBwZRAAIAggCICSGOEAQoaAJYjgDABAQBSSISAEMgJBRBywihGmBBEwI0MEEAASYQjgBlFgDAAEOFAUA4CqAHQggAhIBiCCIeUcoQZAagRTBvuhEAMksEQQABAQJAABDEBACEKOUikIgIpEZghyiFCgBKMAEOEEkJAAQQSSAkhgYEQGaEAQwAAwwhRAAgBnJBUSCEIQQAIcwQBAgqjIAKICWAkIkRJBoSlBDAhHBCMCkMYMxYYBCRgTAgiCKDuE2mEYZQAI5gwhBFBAAFGECcAEAAYBQgwQiHGiFJAAYaIJAYYoZAA"
>>> values, version = chromaprint.decode_fingerprint(fgp)
>>> values[:5], version
([856577305, 591709465, 600082265, 600082297, 600608121], 1)
>>> simhash = chromaprint.hash_fingerprint(values)
>>> simhash
1682219311L
sampsyo commented 4 years ago

Nice; this looks great! Would you mind adding a quick docstring to this function so people can know roughly how to use it without checking the Chromaprint library docs themselves?

redapple commented 4 years ago

@sampsyo sorry for the late reply. I added a docstring.

sampsyo commented 4 years ago

Looks excellent! Thank you!!