aws-solutions / aws-waf-security-automations

This solution automatically deploys a single web access control list (web ACL) with a set of AWS WAF rules designed to filter common web-based attacks.
https://aws.amazon.com/solutions/aws-waf-security-automations
Apache License 2.0
843 stars 361 forks source link

Athena Query Confirmation - Entire bucket every 5 minutes? #133

Closed brandonburkett closed 4 years ago

brandonburkett commented 4 years ago

Hello,

I have 2.3.2 up and running (thank you), but regarding the log parser + athena / glue. Does the athena query run against the entire alb logs bucket or just the files within the 5 minute time range?

I currently have a 12 month lifecycle policy on the alb logs bucket and may need to reduce (the lifecycle policy) if it is running against the entire bucket every 5 mins.

Thanks Brandon

ghost commented 4 years ago

I have (had) the same question.... I realised WAF related costs on our bill was increasing month by month. On inspection I saw this was due to lots of S3 Gets (presumably by Athena). I addressed this by syncing the bucket to another long-term storage bucket (for compliance), and then applied a 1 day retention policy on the source bucket that Athena was querying. Not sure if this was the best approach or not, but it worked for me.

Cheers... Karl

brandonburkett commented 4 years ago

@karl-at-raremark Thank you! Makes perfect sense and I will do something similar. After replicating the bucket, did you apply your lifecycle / retention policy to the entire bucket or just the AWSLogs prefix (did you also include theathena_results prefix as well).

Thanks, Brandon

ghost commented 4 years ago

@brandonburkett I applied Lifecycle rule on AppAccessLogBucket was for whole bucket, to delete after 1 day. Replication to backup-bucket was just for the AWSLogs prefix. I also put a lifecycle rule on the backup bucket to delete after X days.

Prior to doing this I had assumed that the Athena query would only 'Get' objects of a certain recency. I didn't dig into how that might be achieved though, as I knew this lifecycle / replication approach would 'just work'.

brandonburkett commented 4 years ago

Thank you!

karlskidmore commented 4 years ago

@brandonburkett I just saw that v2.3.3 solves this issue at source through partitioning. Thought you might be interested. I'm gonna give this a go (soon).

brandonburkett commented 4 years ago

Thank you! I will give it a shot on our next upgrade. For now the bucket retention and lifecycle policies are working great.