Closed ghost closed 6 years ago
Thanks for all the feedback! We've been wanting to improve the entire rule sourcing process for awhile now, and I'm excited to get some feedback in that regard. Before going into some specific issues you've brought up, let me ask this:
Would it be helpful if rules were stored separately? For example, if YARA rules were just stored in an S3 bucket, then they could be updated without re-deploying BinaryAlert. This adds some latency to the analyzers (which have to download/decompress the compiled rules file once per Lambda container), but then you would not need a terraform apply
to update YARA rules. This is something that has come up before, and we'd love any discussion about this approach.
Another benefit of storing YARA rules in S3 is that a Lambda function could run on a regular interval (e.g. daily) to automatically update the rules, obviating the need for an EC2 instance entirely (serverless FTW!) The downside is that this automatically bundles untrusted rules files from the Internet. Malicious YARA rules could, for example, cause buffer overflows in YARA to gain control over the analyzer Lambda execution, which has access to the S3 bucket with your files. Thoughts?
Now, let's go into some specific issues you've encountered:
androguard
library is not included, which breaks some mobile YARA rules. This is a YARA library, not a Python library, and unfortunately, androguard
has a somewhat tricky install process which involves modifying YARA source files and including the cuckoo library. We can add this to the backlog, but this leads to:.yar
and .yara
files are bundled by BinaryAlert, you can manually rename rules files something like .yara.DISABLED
to ignore them, but it would be great if they were just removed during the clone process.Next week, I'll open individual issues to address the points above, as well as potentially separating the rules.
My vote would go toward storing the YARA rules in an S3 bucket making BinaryAlert serverless. Providing a method for collaboration that could remove the need for automatic updates as YARA rule management might use https://github.com/PUNCH-Cyber/YaraGuardian for example. Upon upload of the YARA rule to the S3 bucket, would it be possible to validate required Python libraries and memory impact to Lambda? Happy to test anything needed!!
The question seems to be if users will be scanning large S3 buckets or running BinaryAlert as an analysis workflow that is my use case.
TL;DR- using git,cron,and deploy.py I'm able to manage rulesets via relatively simple git commits. Large scale "feed BA everything" is my flavor of deployment
I agree that making yara rules more fluid is a needed feature. I've been able to implement a hacky rule sync by using an ec2 instance and basic cron scheduling, and git. Check commit hashes>if new>> sync from git locally && terraform deploy else exit. This allows my analysts to merge any rule changes they need without any interaction from any engineers, and because the deploy will error out on unit test failures, it's been relatively robust in my testing.
I think the added costs and latency assoiciated with direct s3 rule storage might be a bit prohibitive, but I haven't done the exact math to back that opinion up. Some middle ground might be a lambda that checks the git repo/s3 bucket/whatever rule source directly for changes, pulls down the ruleset, compiles/terraforms all of the rules, and re-deploys the environment. So the rules are still stored 'locally' with the lambdas as compiled files, and as uncompiled source in whichever source control you choose.
Also re: scanning large s3 vs analysis workflow. I'm installing BA as a detective control inline with any of our other controls. I'm basically trying to turn the analyst workflow from "look at all of these alerts, choose the ones that might be interesting, run yara against them to determine 'flavor', extract malware, detonate, analyze, mitigate" into "write yara rules to catch sketchy executions and emails, and let automation handle the, extract, detonate, and mitigate." So my usecase definitely tends towards 'large s3 buckets.' But I can see the want for both.
Closing this as part of the more expressive YARA rule cloning from #98 and #99, but feel free to open another issue about any specific follow-ups!
For example, if loading YARA rules from S3 is something that you would want, go ahead and open an issue for it. We've found the local rules in the repo to be the most effective so far
What would it take to run a nightly job to update the remote rule sets? Adding sources to the remote rules makes the unit_test fail.
/opt/binaryalert/rules/clone_rules.py
Need a way to make sure all of the python libraries are available for the rules.
/opt/binaryalert/rules/clone_rules.py
Would it be better to remove the rules or install the missing python libraries?
/opt/binaryalert/rules/compile_rules.py
Compile requires enough memory to complete. These rules required a t2.small to build.
Only a certain number of rules can apply before receiving this error.