Open Dcallies opened 1 year ago
This bug shows up in HMA as well, adding repro in case it's helpful.
For this repro to work, HMA (previously OpenMediaMatch) should be running as a docker container and serving localhost:8080
Reset the tables:
$ docker-compose exec app flask --app OpenMediaMatch.app reset_all_tables
[2024-03-13 18:10:12,303] WARNING in app: No storage class provided, using the default
Create a bank:
$ curl --location 'localhost:8080/c/banks' \
--header 'Content-Type: application/json' \
--data '{
"name": "EVIL_CONTENT_BANK"
}'
{"matching_enabled_ratio":1.0,"name":"EVIL_CONTENT_BANK"}'
Add a file to the bank:
$ curl --location 'localhost:8080/c/bank/EVIL_CONTENT_BANK/content' \
--form 'photo=@"<photo path>"'
{"id":1,"signals":{"pdq":"3517f92351b0e69170c9656ba70c1249d258926d6d65bd2cbcb49cb34bd1c4fb"}}
Rebuild indexes:
$ docker-compose exec app flask --app OpenMediaMatch.app build_indices
[2024-03-13 18:12:21,582] WARNING in app: No storage class provided, using the default
[2024-03-13 18:12:21,596] INFO in build_index: Running the build_all_indices background task
[2024-03-13 18:12:21,628] INFO in build_index: Building index for pdq (1 signals)
[2024-03-13 18:12:21,630] INFO in build_index: Indexed 1 signals for pdq - 0 seconds
[2024-03-13 18:12:21,631] DEBUG in database: Index[pdq] serializing index to tmpfile /tmp/tmp9_c80zmc
[2024-03-13 18:12:21,631] DEBUG in database: Index[pdq] finished writing to tmpfile, 1 signals 889 bytes - 0 seconds
[2024-03-13 18:12:21,635] DEBUG in database: Index[pdq] imported tmpfile as lobject oid 16750 - 0 seconds
[2024-03-13 18:12:21,635] DEBUG in database: Index[pdq] deallocating old lobject 16747
[2024-03-13 18:12:21,636] DEBUG in database: Index[pdq] cleaned up tmpfile
[2024-03-13 18:12:21,639] INFO in build_index: video_md5 index up to date, no build needed
[2024-03-13 18:12:21,639] INFO in build_index: Completed build_all_indices background task - 0 seconds
Query the bank:
$ curl --location 'localhost:8080/m/lookup?signal_type=pdq&signal=3517f92351b0e69170c9656ba70c1249d258926d6d65bd2cbcb49cb34bd1c4fb'
[]
As you can see, the lookup incorrectly returns no matches even though there should be a match. Adding a second photo, reindexing, and then querying again returns a match:
$ curl --location 'localhost:8080/c/bank/EVIL_CONTENT_BANK/content' \
--form 'photo=@"<second photo path>"'
{"id":2,"signals":{"pdq":"cddcc471737d333771469b9e4c119ce6526e52753f86d1239290469b499941be"}}
$ docker-compose exec app flask --app OpenMediaMatch.app build_indices
[2024-03-13 18:14:20,080] WARNING in app: No storage class provided, using the default
[2024-03-13 18:14:20,093] INFO in build_index: Running the build_all_indices background task
[2024-03-13 18:14:20,124] INFO in build_index: Building index for pdq (3 signals)
[2024-03-13 18:14:20,126] INFO in build_index: Indexed 3 signals for pdq - 0 seconds
[2024-03-13 18:14:20,127] DEBUG in database: Index[pdq] serializing index to tmpfile /tmp/tmp_42xfk4q
[2024-03-13 18:14:20,127] DEBUG in database: Index[pdq] finished writing to tmpfile, 3 signals 1176 bytes - 0 seconds
[2024-03-13 18:14:20,132] DEBUG in database: Index[pdq] imported tmpfile as lobject oid 16751 - 0 seconds
[2024-03-13 18:14:20,132] DEBUG in database: Index[pdq] deallocating old lobject 16750
[2024-03-13 18:14:20,134] DEBUG in database: Index[pdq] cleaned up tmpfile
[2024-03-13 18:14:20,137] INFO in build_index: video_md5 index up to date, no build needed
[2024-03-13 18:14:20,137] INFO in build_index: Completed build_all_indices background task - 0 seconds
$ curl --location 'localhost:8080/m/lookup?signal_type=pdq&signal=3517f92351b0e69170c9656ba70c1249d258926d6d65bd2cbcb49cb34bd1c4fb'
["EVIL_CONTENT_BANK"]
Repo:
Expected: any matches However, oddly enough, adding a second hash allows all hashes to to match