airbnb / binaryalert

BinaryAlert: Serverless, Real-time & Retroactive Malware Detection.
https://binaryalert.io
Apache License 2.0
1.41k stars 187 forks source link

Need more details to understand the working of BinaryAlert #117

Closed adutta14 closed 6 years ago

adutta14 commented 6 years ago

Hello,

I have setup the binaryalert infrastructure as mentioned in the document. However, once i start uploading data to my s3 buckets, i get "INSUFFICIENT_DATA: "cyr_3_binaryalert_analyzer_no_analyzed_binaries" emails for few files and i am not sure about the other uploads.

I am failing to understand on that part that what happens after the analyzer (lambda) has analyzed the files and how do i see my end results.

At the end of the day, i would like to see which files have been tested positive with the YARA rules and which are not.

Thanks, Abhishek Dutta

austinbyers commented 6 years ago

Hi @adutta14,

INSUFFICIENT_DATA is just a status of a CloudWatch alarm. The alarms are designed to trigger if your BinaryAlert infrastructure is unhealthy, they don't refer to the analysis itself.

When you upload a file to the BinaryAlert S3 bucket, it is queued for analysis. If the file matches a YARA rule, 2 things happen:

  1. The YARA match is sent to an SNS topic, which you can forward wherever you like (email, SMS, other Lambda functions, etc)
  2. The match is saved to DynamoDB for historical record.

If the file does not match any YARA rules, nothing happens. If there was no alert and the record isn't in Dynamo, the file is safe. #104 is an open issue about alerting on safe files as well

If you haven't already seen it, the documentation on the analysis lifecycle may be helpful

adutta14 commented 6 years ago

Hello Austin,

Thank you for your response. Yes, i have seen those documents and i would like to know more on the below points:-

1) what format are the matches saved in the Dynamo DB? Is the file itself is saved or there is some analysis result saved? What exact filters do we need to provide to query the Dynamo DB to view all the records in there?

2) I am assuming that "_binaryalert_yara_match_alerts" is the SNS topic which should be sending the emails once there is a match. Please correct me if i am wrong.

3) Are the files, that are queued for analysis , only when a new file is uploaded or it even queues for the existing files that are already there in the s3 bucket? Does it queue based on the timestamp of the file to keep track of what files have been queued already and which are the new ones?

Regards, Abhishek Dutta

austinbyers commented 6 years ago
  1. If you view the DynamoDB table in the AWS console, you'll see all the records. If you want to use the AWS SDK, you can just run a table scan (no filters necessary). Items are keyed by (SHA256, AnalyzerVersion)

  2. Yes, NAME_PREFIX_binaryalert_yara_match_alerts (see Adding SNS Subscription)

  3. Files are queued when they are uploaded, and then not again until you run a batch analysis (./manage.py analyze_all), which will enqueue the entire bucket.

If you haven't already, I recommend running a ./manage.py live_test. This will upload files to S3, print the matched record format, and send the alerts to the SNS topic

adutta14 commented 6 years ago

Hello Austin,

Thank you. I executed the below commands:-

:~/binaryalert$ ./manage.py live_test Uploading eicar.txt to S3:cyr.3.binaryalert-binaries.us-west-2:eicar.txt_10f36a1ec82c... Uploading eicar.tar.gz.bz2 to S3:cyr.3.binaryalert-binaries.us-west-2:eicar.tar.gz.bz2_10f36a1ec82c... Looking up version of cyr_3_binaryalert_analyzer:Production... [1/15] Querying DynamoDB table for the expected YARA match entries... [2/15] Querying DynamoDB table for the expected YARA match entries...

[{'AnalyzerVersion': {'N': '2'}, 'MD5': {'S': '44d88612fea8a8f36de82e1278abb02f'}, 'MatchedRules': {'SS': ['public/eicar.yara:eicar_av_test', 'yextend:eicar_av_test']}, 'S3LastModified': {'S': '2018-05-04 19:36:49+00:00'}, 'S3Metadata': {'M': {'filepath': {'S': 'eicar.txt'}}}, 'S3Objects': {'SS': ['S3:cyr.3.binaryalert-binaries.us-west-2:eicar.txt_10f36a1ec82c']}, 'SHA256': {'S': '275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f'}}, {'AnalyzerVersion': {'N': '2'}, 'MD5': {'S': '31563a6ea5ebcbc67b9cfe0739c32acb'}, 'MatchedRules': {'SS': ['yextend:eicar_av_test']}, 'S3LastModified': {'S': '2018-05-04 19:36:49+00:00'}, 'S3Metadata': {'M': {'filepath': {'S': 'eicar.tar.gz.bz2'}}}, 'S3Objects': {'SS': ['S3:cyr.3.binaryalert-binaries.us-west-2:eicar.tar.gz.bz2_10f36a1ec82c']}, 'SHA256': {'S': 'ed5c04951db73577ab277f5b895a16abe5f27c19d1ac0ac67a437cc998170e52'}}]

SUCCESS: Expected DynamoDB entries for the test files were found! Removing test files from S3... Removing DynamoDB match entries... Done!

:~/binaryalert$ ./manage.py analyze_all Asynchronously invoking cyr_3_binaryalert_batcher... Batcher invocation successful!

There are around 63 files in my S3 bucket expected to be malware. I got the SNS alert for only the first file(as per the creation date). So, is it like all others didn't match the YARA rules or do i need to run the "analyze_all" for all of them?

Also, in the metadata generated in the SNS alert, the first rule "eicar_av_test" is the test rule. Please correct me if i am wrong. The second rule file is "yextend" which i am finding it difficult to locate. So, shouldn't it match the malware rules?

Thanks, Abhishek Dutta

austinbyers commented 6 years ago

After you ran ./manage.py analyze_all, BinaryAlert scanned everything in your bucket. If you only got one SNS alert, then that was the only file which matched a YARA rule. You can always check the DynamoDB table for the full list of matches BinaryAlert has ever found.

public/eicar.yara:eicar_av_test means the YARA rule named eicar_av_test in the file public/eicar.yara found a match.

yextend is actually a subprocess which runs archive analysis. So yextend:eicar_av_test means the same YARA rule eicar_av_test was also triggered by yextend's archive analysis (it often will match the same rules)

adutta14 commented 6 years ago

Thank you. Makes more sense now.