Closed thedanielsun closed 2 months ago
Ugh, we need to fix this ASAP, oversight in the signal normalization in PDQ itself. I think I can generate an easy repo using the file storage and just copy the file.
Thanks a ton @thedanielsun, this issue report is very clear and actionable.
There is a corrupt entry in the StopNCII data:
('pdq', '00000000000000000000000000000000')
While it's probably good for StopNCII to remove this data from their upstream as well, it would be nice for threatexchange CLI to ignore corrupt data when rebuilding indexes.
I hit error on this line: https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_faiss_matcher.py#L240
but there is similar usage on this line as well: https://github.com/facebook/ThreatExchange/blob/main/python-threatexchange/threatexchange/signal_type/pdq/pdq_faiss_matcher.py#L185