CybercentreCanada / assemblyline

AssemblyLine 4: File triage and malware analysis
https://cybercentrecanada.github.io/assemblyline4_docs/
MIT License
249 stars 15 forks source link

Badlist normalization is inconsistently used and causes misses #272

Open kam193 opened 1 month ago

kam193 commented 1 month ago

Describe the bug I was investigating why a URL, already included in IoCs exported from my collection, weren't recognized on a few submissions. It's turned out, the URL contains some upper characters while the version saved by Badlist updater was normalized to lower characters... but the normalization is not performed when matching against badlist. It looks like the manual adding items to Badlist also isn't normalized.

To Reproduce Steps to reproduce the behavior:

  1. Have a file with URL including an URL in both normalized and original form, e.g. a TXT file like:
    https://exampleee.com/SomeUpperCaseURL
    https://exampleee.com/someuppercaseurl
  2. Add the normalized form to badlist manually.
  3. Submit file and observe only the normalized form being marked.
  4. Add the original form to a badlist update source and trigger downloading.
  5. Resubmit file and observe only the normalized form being marked. In addition, the URL in badlist should be only in the normalized form.
  6. Add the original form to the badlist manually. In addition, the URL in Badlist should now be also in the original form.
  7. Resubmit file and observe both URLs are marked.

Expected behavior Tags are matched regardless of the normalized or not form.

Screenshots

Environment (please complete the following information if pertinent):

Additional context

I have found the normalization in the updating server: https://github.com/CybercentreCanada/assemblyline-service-badlist/blob/3821ce750186704fd649609352ca822bf949877b/badlist/update_server.py#L158 But neither in Badlist client: https://github.com/CybercentreCanada/assemblyline-core/blob/06eb4c46f77be82e657de489d3d8d9350e709ea1/assemblyline_core/badlist_client.py#L139-L152 service API: https://github.com/CybercentreCanada/assemblyline-v4-service/blob/7dbbddc1cbedc8c3324c8a936b255552bd62fde6/assemblyline_v4_service/common/api.py#L140-L152 nor the badlist itself: https://github.com/CybercentreCanada/assemblyline-service-badlist/blob/3821ce750186704fd649609352ca822bf949877b/badlist/badlist.py#L97-L122

I suspect it may also be a case for file hashes and the safelist, but I haven't tested those cases