add_athena_partitions.py should fire more often, or at the top of every hour exactly.

ChrisPetr0 commented 3 years ago

Describe the bug When using Athena-Log-Parser options for HTTP Flood, the add_athena_partitions.py is set to once per hour via CloudWatch events. If the CFN Stack is kicked off midway through the hour, then the partitions in AWS Glue pointing to the correct S3 hour of logs isn't updated until midway through the hour. This creates a condition where Athena queries do not scan the correct S3 hour key until the Lambda kicks off updating the AWS Glue partitions.

relevent template snippet:

  LambdaAddAthenaPartitionsEventsRule:
    Type: 'AWS::Events::Rule'
    Condition: AthenaLogParser
    Properties:
      Description: Security Automations - Add partitions to Athena table
      ScheduleExpression: rate(1 hour)

To Reproduce Change QueryScheduledRunTime to 1 (line 294). Change line 1196 to ScheduleExpression: !Join ['', ['rate(', !FindInMap ["Solution", "Athena", "QueryScheduledRunTime"], ' minute)']]

Run the CFN template midway through any hour of the day with these params (making sure it completes by :45 after the hour or so):

ActivateAWSManagedRulesParam    no  
ActivateBadBotProtectionParam   no  
ActivateCrossSiteScriptingProtectionParam   yes 
ActivateHttpFloodProtectionParam    yes  Amazon Athena log parser   
ActivateReputationListsProtectionParam  yes 
ActivateScannersProbesProtectionParam   yes  Amazon Athena log parser   
ActivateSqlInjectionProtectionParam yes 
AppAccessLogBucket  truncated   
EndpointType    ALB 
ErrorThreshold  50  
KeepDataInOriginalS3Location    No  
RequestThreshold    100 
WAFBlockPeriod  5

Modify Kinesis Firehose hints to 60s and 1MB via console.

This works, until the hour changes. Now, the AWS Glue Partition is not updated to point at the next hour in the S3 WAF Logs bucket until the LambdaAddAthenaPartitionsEventsRule fires which all depends on the minute within the hour that the LambdaAddAthenaPartitionsEventsRule resource was created.

To reproduce, Associate WAF with ALB. Send requests meeting threshold for flood. Wait until hour changes to next hour. Repeat and watch the flood rule not engage until the next time the LambdaAddAthenaPartitionsEventsRule fires.

Expected behavior Expect HTTP Flood to add IP to Blacklist after approximately 2 minutes (when threshold is achieved) and revert it back after 5 minutes (when requests stop) for every minute of every hour. This is the expectation with the parameters and modifications described above.

Please complete the following information about the solution:

[ ] Version: [v3.1.0]
[ ] Region: [us-east-1]
[Yes as described above] Was the solution modified from the version published on this repository?
[No ] If the answer to the previous question was yes, are the changes available on GitHub?
[No ] Have you checked your service quotas for the sevices this solution uses?
[ No] Were there any errors in the CloudWatch Logs?

Screenshots If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).

Additional context To fix the problem,

  LambdaAddAthenaPartitionsEventsRule:
    Type: 'AWS::Events::Rule'
    Condition: AthenaLogParser
    Properties:
      Description: Security Automations - Add partitions to Athena table
      ScheduleExpression: rate(1 hour)

should be set to CRON at the top of every hour or to run every minute.

aijunpeng commented 3 years ago

Thanks for reporting the issue and providing detailed information. You are correct that the job should run at the top of every hour. We have added your request to the backlog and it will be looked into in future solution releases.

dscpinheiro commented 2 years ago

Hi!

We just released v3.2.0 of the solution, and this issue has been fixed.

pravinsingh commented 2 years ago

I see that to fix the issue, ScheduleExpression: rate(1 hour) is changed to ScheduleExpression: cron(* ? * * * *). This could result in the issue becoming intermittent. Here's how:

The cron is scheduled to run at exactly the top of every hour.
The new 'hour' folder in S3 also gets created at exactly the top of every hour.

So this will cause a race condition. Sometimes the S3 folder will get created a few milliseconds before the cron tries to access it and everything will be fine. Other times the cron will fire first, trying to create a partition based on a non-existent S3 folder, and will fail.

To fix this, the ScheduleExpression should be cron(1 * * * ? *) so it gets called a minute after folder creation. This does delay the data availability for another minute, but makes sure the partition is successfully created every time.

aijunpeng commented 2 years ago

Thanks for the comment. The partition keys are created with the table. Athena query for adding partition uses partition keys in the table. S3 folder exists or not won't matter. Have you experienced any adding partition query error because folder doesn't exist? If so, can you please provide detail?

aijunpeng commented 2 years ago

Also the new 'hour' s3 folder is created whenever logs are processed and inserted into s3 by kinesis firehose, not at the top of every hour.

zsidez commented 6 months ago

Is this a valid cron expression? cron(* ? * * * *)

EventBridge's Define schedule shows error: Invalid CRON expression

Latest version (v4.0.2) of Security Automation still have this setting

aws-solutions / aws-waf-security-automations

add_athena_partitions.py should fire more often, or at the top of every hour exactly. #186