Open master-hax opened 1 year ago
Hey @master-hax, thanks for reaching out!
This is currently intended behavior, however, providing consistency between the file collab type and --hashes might be useful.
--hashes
should be requiring you to also do -S
to pick the type.
The way I've done it is using cut -d ' ' -f 2
to get only the hashes.
Another trick might be to do something like dataset -p
does, which is not print columns that you are explicitly selecting (since otherwise | grep
would have worked).
@master-hax some questions:
threatexchange match --hashes -S pdq photo -- text is not pdq
Hey @Dcallies, thanks for the fast response
responses:
hash
and match --hashes
commands to be compatible even if they are not compatible by default. e.g. maybe we could add a flag to hash
like hash --raw
or hash --no-prefix
that outputs the hashes in a match
compatible format? (without the "pdq " prefix)-S pdq
args, i think threatexchange should attempt to interpret the string "text is not pdq" as a pdq hash. and then possibly fail due to the input not being in hexadecimal or not being the correct lengthAfter thinking about it for a bit, I think it's reasonable to expect that match --hashes
and hash
will be compatible.
However, definitely want to limit the number of arguments on these commands, especially if there are equivalents in existing bash tools.
--hashes requiring -S has bothered me for a while, I propose we do the following:
match --hashes
requires -S
--hashes
will default to looking for the signal type prefix and will interpret it in order to select the right indexThis solution provides:
But has the downside of potential ambiguity if there is a future signal type which the hash can contain the prefix of its signal type. This is possible with 'raw_text' which a string that started with 'raw_text' would be discovered if using -S
with only one option. This seems okay since raw_text is an innocuous string, and has a simple workaround (no -S
).
To handle this, we can update the document strings and the error message for missing prefix on hashes.
Working with Sam on this as a ramp-up task for the upcoming hackathon!
threatexchange hash photo <path-to-image-file>
outputs hashes in the formpdq f5b44e4f1673f4e44e57fd6d021201effd8b40d6b54906d3dc302df672160708
however
threatexchange match --hashes photo <path-to-hash-file>
expects a single file without a "pdq " prefixthis causes an inconsistency where hashes generated by
threatexchange hash
can't be directly used inthreatexchange match --hashes