NationalSecurityAgency / datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
https://code.nsa.gov/datawave
Apache License 2.0
561 stars 243 forks source link

Fixed NPE in FullSSDeepDiscoveryChainStrategy when the initial query returned no results #2503

Closed drewfarris closed 1 month ago

drewfarris commented 1 month ago

When running the SSDeepSimilarityDiscovery query, where we first run a query to find similar ssdeep hashes and then run a discovery query on the similar ssdeep hashes that were found - it was discovered that if the first query returns no similar ssdeep hashes, the second query will still attempt to run, but ends up throwing an NPE.

This fix will avoid attempting to run the second query when no similar ssdeep hashes are returned from the first query. The query will return zero results instead.

Added unit tests for this case. While writing these tests also discovered an issue where it is possible that a user can enter a valid ssdeep hash for which we will generate no ngrams (due repeating letters in the hash). Added a fix and tests for this pathalogical case as well.