aws-samples / amazon-guardduty-waf-acl

AWS GD2ACL
https://aws.amazon.com/blogs/security/how-to-use-amazon-guardduty-and-aws-web-application-firewall-to-automatically-block-suspicious-hosts/
MIT No Attribution
56 stars 40 forks source link

Repeated NACLs and Unregistered in DynamoDB #4

Closed ghernandez16 closed 5 years ago

ghernandez16 commented 5 years ago

We have an issue with this. Sometimes we get NACLs with the same IP address and some missing records in Dynamo DB. For example: image and image if I review the Dynamo Table this is what I get image I have already change lambda concurrency to 1. Below Cloudwatchlogs 13:07:02 START RequestId: 7943a5b4-4aa5-476a-ac7e-76fb34efeef4 Version: $LATEST  13:07:02 [INFO] 2019-03-27T18:07:02.288Z 7943a5b4-4aa5-476a-ac7e-76fb34efeef4 log -- Event: {"version": "0", "id": "24f964e2-345a-d941-3b50-125a6c6d857b", "detail-type": "GuardDuty Finding", "source": "aws.guardduty", "account": "xxxxxxxxxxxx", "time": "2019-03-27T18:07:00Z", "region": "us-east-2", "resources": [], "detail": {"schemaVersion": "2.0", "accountId": "xxxxxxxxxx", "region": "us-east-2", "part  13:07:02 [INFO] 2019-03-27T18:07:02.410Z 7943a5b4-4aa5-476a-ac7e-76fb34efeef4 Found credentials in environment variables.  13:07:03 [INFO] 2019-03-27T18:07:03.311Z 7943a5b4-4aa5-476a-ac7e-76fb34efeef4 Starting new HTTPS connection (1): ec2.us-east-2.amazonaws.com  13:07:03 [INFO] 2019-03-27T18:07:03.611Z 7943a5b4-4aa5-476a-ac7e-76fb34efeef4 entering update_nacl, netacl_id=acl-7b4a1c12, host_ip=81.22.45.144  13:07:03 [INFO] 2019-03-27T18:07:03.951Z 7943a5b4-4aa5-476a-ac7e-76fb34efeef4 Starting new HTTPS connection (1): dynamodb.us-east-2.amazonaws.com  13:07:04 [INFO] 2019-03-27T18:07:04.469Z 7943a5b4-4aa5-476a-ac7e-76fb34efeef4 Starting new HTTPS connection (1): ec2.us-east-2.amazonaws.com  13:07:04 [ERROR] 2019-03-27T18:07:04.749Z 7943a5b4-4aa5-476a-ac7e-76fb34efeef4 Something went wrong.  13:07:04 An error occurred (NetworkAclEntryAlreadyExists) when calling the CreateNetworkAclEntry operation: The network acl entry identified by 81 already exists.: ClientError Traceback (most recent call last): File "/var/task/lambda_function.py", line 488, in lambda_handler response = update_nacl(netacl_id=NetworkAclId,host_ip=HostIp, region=Region) File "/var/task/lambda_function.py", line 308, i  13:07:04 END RequestId: 7943a5b4-4aa5-476a-ac7e-76fb34efeef4

What could be happening? Struggling with this issue several weeks ago and can't find a solution. Thanks,

cgulliver commented 5 years ago

Thank you for the detailed info. We are looking into this and will respond back here. Stay tuned.

ghernandez16 commented 5 years ago

Is there any findings? Today I found that one rule is duplicated in DynamoDB, and missing rule 78 and 79. image Thanks,

cgulliver commented 5 years ago

We are testing an updated version of the Lambda function code for this sample. It is in the dev branch - guardduty_to_acl_lambda.py.

cgulliver commented 5 years ago

The updates are merged. A simple way to test is delete the associated entries in NACL's, WAF IP Sets and clear the DDB table, then replace the code in the lambda functions GuardDutytoACL-GuardDutytoACLLambda and GuardDutytoACL-PruneOldEntriesLambda with the new code... https://github.com/aws-samples/amazon-guardduty-waf-acl/tree/master/lambda. Would greatly appreciate feedback if this resolves your issue or not.

ghernandez16 commented 5 years ago

Hi, 3 day with new lambda. Just to comment I don't use WAF or Cloudfront, just NACL so I remove that part of the code. Everything is fine. Just an issue with one IP that gave the following error for some reason.

01:07:10 [ERROR] 2019-04-18T06:07:10.311Z b1369716-e81e-4eb5-82a5-370e29c9b24b Something went wrong. [ERROR] 2019-04-18T06:07:10.311Z b1369716-e81e-4eb5-82a5-370e29c9b24b Something went wrong.  01:07:10 An error occurred (InvalidNetworkAclEntry.NotFound) when calling the DeleteNetworkAclEntry operation: no ingress entry with number 72 in network ACL acl-66274403: ClientError Traceback (most recent call last): File "/var/task/lambda_function.py", line 355, in lambda_handler response = update_nacl(netacl_id=NetworkAclId,host_ip=HostIp, region=Region) File "/var/task/lambda_function.py", lin An error occurred (InvalidNetworkAclEntry.NotFound) when calling the DeleteNetworkAclEntry operation: no ingress entry with number 72 in network ACL acl-66274403: ClientError Traceback (most recent call last): File "/var/task/lambda_function.py", line 355, in lambda_handler response = update_nacl(netacl_id=NetworkAclId,host_ip=HostIp, region=Region) File "/var/task/lambda_function.py", line 174, in update_nacl delete_netacl_rule(netacl_id=netacl_id, rule_no=oldruleno) File "/var/task/lambda_function.py", line 245, in delete_netacl_rule RuleNumber=rule_no File "/var/runtime/boto3/resources/factory.py", line 520, in do_action response = action(self, *args, kwargs) File "/var/runtime/boto3/resources/action.py", line 83, in call response = getattr(parent.meta.client, operation_name)(params) File "/var/runtime/botocore/client.py", line 314, in _api_call return self._make_api_call(operation_name, kwargs) File "/var/runtime/botocore/client.py", line 612, in _make_api_call raise error_class(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (InvalidNetworkAclEntry.NotFound) when calling the DeleteNetworkAclEntry operation: no ingress entry with number 72 in network ACL acl-66274403

rule 72 did exist at the moment. Anyway after that no other issues. i'll do this in other couple of accounts and share my findings Thanks

cgulliver commented 5 years ago

Thanks for the continued feedback, not sure why you saw that error but if the DDB table gets out of sync with the NACL, it should exit with an error.

ghernandez16 commented 5 years ago

Today we found this error. It overwrited NACL rule 71 with a new IP, but never write over DDB.

[INFO] 2019-04-20T12:08:13.77Z 35ca7c7d-88c1-4366-97db-1fd58ba5735c log -- current DDB entries, [72, 73, 74, 75, 76, 77, 78, 79, 80]. 12:08:13 [INFO] 2019-04-20T12:08:13.77Z 35ca7c7d-88c1-4366-97db-1fd58ba5735c log -- current NACL entries, [71, 72, 73, 74, 75, 76, 77, 78, 79, 80]. [INFO] 2019-04-20T12:08:13.77Z 35ca7c7d-88c1-4366-97db-1fd58ba5735c log -- current NACL entries, [71, 72, 73, 74, 75, 76, 77, 78, 79, 80]. 12:08:13 [ERROR] 2019-04-20T12:08:13.77Z 35ca7c7d-88c1-4366-97db-1fd58ba5735c NACL rule state mismatch, [71] exiting [ERROR] 2019-04-20T12:08:13.77Z 35ca7c7d-88c1-4366-97db-1fd58ba5735c NACL rule state mismatch, [71] exiting

Below the log where the issue happened 12:07:12 [INFO] 2019-04-20T12:07:12.678Z 87db060c-cde6-4b8e-bf4b-098031802450 entering update_nacl, netacl_id=acl-7b4a1c12, host_ip=198.143.133.154 12:07:12 [INFO] 2019-04-20T12:07:12.919Z 87db060c-cde6-4b8e-bf4b-098031802450 Starting new HTTPS connection (1): dynamodb.us-east-2.amazonaws.com 12:07:13 [INFO] 2019-04-20T12:07:13.180Z 87db060c-cde6-4b8e-bf4b-098031802450 Starting new HTTPS connection (1): ec2.us-east-2.amazonaws.com 12:07:13 [INFO] 2019-04-20T12:07:13.677Z 87db060c-cde6-4b8e-bf4b-098031802450 Starting new HTTPS connection (1): ec2.us-east-2.amazonaws.com 12:07:14 [INFO] 2019-04-20T12:07:14.17Z 87db060c-cde6-4b8e-bf4b-098031802450 Starting new HTTPS connection (1): dynamodb.us-east-2.amazonaws.com 12:07:14 [INFO] 2019-04-20T12:07:14.338Z 87db060c-cde6-4b8e-bf4b-098031802450 Starting new HTTPS connection (1): ec2.us-east-2.amazonaws.com 12:07:14 [INFO] 2019-04-20T12:07:14.637Z 87db060c-cde6-4b8e-bf4b-098031802450 Starting new HTTPS connection (1): dynamodb.us-east-2.amazonaws.com 12:07:14 END RequestId: 87db060c-cde6-4b8e-bf4b-098031802450

ghernandez16 commented 5 years ago

Rule 71 was different IP from the updated done. I deleted NACL 71 to check if it starts working again because following events just said mismatch between NACL and DDB and do nothing. Other thing I've always been curious. is that I just receive events at 1:00, 7:00, 13:00 and 19:00 UTC as it was scheduled, but it is not, it is triggered by GuardDuty and I receive port probes at different times of the day, why is that? Thanks,

cgulliver commented 5 years ago

I have not seen this behavior yet but if you made edits to the code, make sure you preserve the parts that delete the old entry (NACL and DDB) when you reach 10 rules to make room for the new rule. The prune script also clears out old entries based on the retention set so make sure that also deletes the entry from both the NACL and DDB table. Can you share the modified Lambda code you are running for both functions?

Also, for testing purposes, a new script has been added to make it easy to check the state on a NACL below. Just update "AWS_REGION" and "ACLMETATABLE" then run with "python gd2acl-sync-check.py nacl-id".

https://github.com/aws-samples/amazon-guardduty-waf-acl/blob/dev/scripts/gd2acl-sync-check.py

Finally, this sample uses CloudWatch Events to trigger the Lambda. Here is some info on CloudWatch Events Notification Frequency for GuardDuty.

ghernandez16 commented 5 years ago

Below the codes

purge_old.txt gduty_to_acl.txt

cgulliver commented 5 years ago

After reviewing, nothing stands out in those vs. the master code. However, an update has just been pushed to improve the logging verbosity which will hopefully help diagnose any potential issues...

https://github.com/aws-samples/amazon-guardduty-waf-acl/tree/master/lambda

ghernandez16 commented 5 years ago

Yes, i just remove the WAF and Cloudfront sections. Still the same issue with rule mismatch and if it found this nothing works,

13:08:10 [INFO] 2019-04-22T13:08:10.111Z 0b5ac30a-9d4e-4212-aa61-f20ec85fdf1f Starting new HTTPS connection (1): ec2.us-east-2.amazonaws.com 13:08:10 [INFO] 2019-04-22T13:08:10.369Z 0b5ac30a-9d4e-4212-aa61-f20ec85fdf1f log -- current DDB entries, [71, 72, 74, 75, 76, 77, 78, 79, 80]. 13:08:10 [INFO] 2019-04-22T13:08:10.369Z 0b5ac30a-9d4e-4212-aa61-f20ec85fdf1f log -- current NACL entries, [71, 72, 73, 74, 75, 76, 77, 78, 79, 80]. 13:08:10 [ERROR] 2019-04-22T13:08:10.370Z 0b5ac30a-9d4e-4212-aa61-f20ec85fdf1f NACL rule state mismatch, [73] exiting 13:08:11 END RequestId: 0b5ac30a-9d4e-4212-aa61-f20ec85fdf1f

Deleted both NACL and DDB Table and start over.

cgulliver commented 5 years ago

Does anything stand out in the pruning function log? In the pruning function, you don't need this line if you are not using the WAF functionality...

response_nonexpired = table.scan( FilterExpression=Attr('CreatedAt').gt(expire_time) & Attr('HostIp').eq(HostIp) )

Go ahead and delete the above line. Also, if you can update your code from master, the additional logging verbosity may help us.

ghernandez16 commented 5 years ago

Line deleted. Even so, my pruning function triggers almost never, because the list covers 10 IPs and I get almost 10 per day. Rewrited IPs is the most common thing. I'll update the codes and let you know.

cgulliver commented 5 years ago

If you want to test it w/o waiting for GuardDuty, you can create a Lambda test event from the event payload in your log. Change the IP value in remoteIpDetails": { "ipAddressV4": "x.x.x.x"... and it will create a new entry.

ghernandez16 commented 5 years ago

Hi, So far so good in one account. Same mess on the other. Both tables were blank because of purge and wasn't adding any new IP because of mismatch, but I think I forget to delete the table before so I'll come back to you if there is an issue.

Just another question, how does the lambda manage two IPs in the same GuardDuty json. It seems it only takes the first one. { "localPortDetails": { "port": 8080, "portName": "Unknown" }, "remoteIpDetails": { "country": { "countryName": "Russia" }, "city": { "cityName": "" }, "geoLocation": { "lon": 37.6068, "lat": 55.7386 }, "organization": { "asnOrg": "SS-Net", "org": "IP Khnykin Vitaliy Yakovlevich", "isp": "IP Khnykin Vitaliy Yakovlevich", "asn": "204428" }, "ipAddressV4": "185.176.27.50" } }, { "localPortDetails": { "port": 9000, "portName": "Unknown" }, "remoteIpDetails": { "country": { "countryName": "United States" }, "city": { "cityName": "San Diego" }, "geoLocation": { "lon": -117.1367, "lat": 32.7967 }, "organization": { "asnOrg": "CariNet, Inc.", "org": "CariNet", "isp": "CariNet", "asn": "10439" }, "ipAddressV4": "71.6.167.142" }

ghernandez16 commented 5 years ago

For the above question. This is an example.

Screenshot_5 Screenshot_4

Why it seems to try to add the same ip twice? Thanks,

cgulliver commented 5 years ago

What is the timeout set to in your Lambda functions? It looks like the first invocation timed out after 3 seconds. The template should set the timeout value to 5 minutes and also set reserve concurrency to 1. If needed, please increase the timeout and confirm concurrency is 1.

The current logic only processes a single IP address value ["remoteIpDetails"]["ipAddressV4"] in the finding JSON payload. What is the finding type for your example above?

ghernandez16 commented 5 years ago

Hi @cgulliver. Everything working great now! Thanks for the help!

cgulliver commented 5 years ago

Glad to hear that @ghernandez16 . If possible, would you mind replying to the blog post comments as well in case others need to update?