facebook / ThreatExchange

Trust & Safety tools for working together to fight digital harms.
https://developers.facebook.com/docs/threat-exchange
Other
1.17k stars 316 forks source link

[py-tx] CLI error opaque for PDQ match with low hash quality #1256

Open thedanielsun opened 1 year ago

thedanielsun commented 1 year ago

Using this image https://styles.redditmedia.com/t5_17138f/styles/profileBanner_p7ne95txaxfa1.png (I think it's just black) image

threatexchange hash photo https://styles.redditmedia.com/t5_17138f/styles/profileBanner_p7ne95txaxfa1.png has blank output because the hash is under the quality threshold. I kind of get the output, but I wasn't sure if something was broken.

threatexchange match photo https://styles.redditmedia.com/t5_17138f/styles/profileBanner_p7ne95txaxfa1.png
Traceback (most recent call last):
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/bin/threatexchange", line 8, in <module>
    sys.exit(main())
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/main.py", line 319, in main
    inner_main()
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/main.py", line 312, in inner_main
    execute_command(settings, namespace)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/main.py", line 157, in execute_command
    command.execute(settings)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/match_cmd.py", line 202, in execute
    results = _match_file(path, s_type, index)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/cli/match_cmd.py", line 226, in _match_file
    return index.query(s_type.hash_from_file(path))
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_index.py", line 53, in query
    results = self.index.search_with_distance_in_result(
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_faiss_matcher.py", line 268, in search_with_distance_in_result
    return super().search_with_distance_in_result(queries, threshhold)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/threatexchange/signal_type/pdq/pdq_faiss_matcher.py", line 127, in search_with_distance_in_result
    limits, similarities, I = self.faiss_index.range_search(qs, threshhold + 1)
  File "/Users/daniel.sun/.pyenv/versions/3.8.12/lib/python3.8/site-packages/faiss/__init__.py", line 603, in replacement_range_search
    assert d * 8 == self.d

threatexchange match throws an opaque FAISS exception when I think we should probably just check hash existence (if "" is the notation for low quality hash)

Dcallies commented 1 year ago

Weird, I thought I added an explicit filter so it would just fail to match, but clearly something is borked. Thanks for the flag!