airbnb / binaryalert

BinaryAlert: Serverless, Real-time & Retroactive Malware Detection.
https://binaryalert.io
Apache License 2.0
1.4k stars 187 forks source link

Collection of fixes for analyzer errors #116

Closed austinbyers closed 6 years ago

austinbyers commented 6 years ago

to: @chunyong-lin cc: @ryandeivert @airbnb/binaryalert-maintainers size: medium

Background

We've identified a number of relatively rare errors that can happen in the BinaryAlert analyzer. These errors are listed below, along with their fixes.

Changes

Problems and solutions:

  1. Empty strings are allowed in S3 metadata, but they are not allowed in DynamoDB.
    • Fix: Replace empty strings before creating a Dynamo entry for a YARA match
  2. In certain conditions, the analyzer can run out of /tmp disk space. We have been removing our own /tmp files, but it turns out yextend creates temporary files (likely due to pdftotext) which are not always removed correctly.
    • Fix: Instead of shredding just the downloaded binary, the analyzer now shreds the entire /tmp directory when it is done scanning a binary
  3. There can be an IndexError when parsing yextend output, because detected offsets do not necessarily contain a colon. The code currently assumes all offsets are of the form 0x123:$var_name, but 0x123 can also be reported by YARA.
    • Fix: When reporting on matched strings, ignore offsets without a colon.
  4. UPX output is cluttering the logs
    • Fix: All UPX output (stdout and stderr) is ignored
  5. Yextend can sometimes report a single-line error message before its JSON output. This is likely due to pdftotext.
    • Fix: If JSON decoding of yextend output fails, we try again, skipping the first line.
  6. If yextend can't analyze an archive (for example, a password-protected zipfile), it returns a match result with the YARA rule name set to "Anomalies present in Archive (possible Decompression Bomb)".
    • Fix: Since this isn't actually a YARA match, ignore this result

Other (non-analyzer):

  1. [CLI] The CbEnterpriseResponseAPI is deprecated
    • Fix: Use the more general CbResponseAPI
  2. pip3 install -r requirements.txt throws a warning about moto being incompatible with python-dateutils.
    • Fix: Pin the version of python-dateutils to a version accepted by moto

Testing

coveralls commented 6 years ago

Coverage Status

Coverage decreased (-0.5%) to 92.519% when pulling ef253e5fc635aaed25764671857b8d05f68221a1 on austin-fix-errors into 20ee8528fdc6415a8653c971502de4da9b4be98f on master.