Closed smokentar closed 8 months ago
I'm currently deleting recovery points older than 10 days using the below script:
aws backup list-recovery-points-by-backup-vault --backup-vault-name aft-controltower-backup-vault --by-resource-type "DynamoDB" --by-created-before $(date -d "-10 days" +%Y-%m-%d) | jq -r '.RecoveryPoints[].RecoveryPointArn' | xargs -I {} aws backup delete-recovery-point --backup-vault-name aft-controltower-backup-vault --recovery-point-arn {}
This is very slow but it's better than nothing - I haven't found a way or API to delete in bulk and adding concurrency just makes things worse due to rate limiting (I presume).
Hey Dimitar, thanks for the detail; I'll add this info to our internal tracking for ways to reduce the cost of AFT.
We also encountered an unusual spike in costs on 4 Jan 2023. AWS Config suddenly registered a change to all our recovery points and generated costs against the EU-ConfigurationItemRecorded usage type. We didn't make any changes. This is currently with AWS Support. The number of recovery points does seem a bit excessive, an automatic tidy up would be great.
We had this happen to us as well, our config bill spiked by about 80 bucks yesterday when it's normally pennies a day.
Not quite sure why it spiked yesterday and not earlier.
I opened a PR understanding that you are not accepting contributions to this product but this is a big pain in the butt to fix and we'd rather not fork your module and deal with managing drift while we wait for you to fix. I'm just demonstrating how small of a fix it is.
Also would be nice if the period is adjustable I personally don't need a backup every hour.
Interesting! We also recently saw a spike in EUN1-ConfigurationItemRecorded in our AFT account which we don't really understand either. Our spike occurred on Jan 11th and landed on $80.33 where typical daily charge is ~$0.30. Not sure if this is related to this issue tho? 🤔
Regarding the backup vault config, it does indeed seem bad to be ever growing, but looking into our costs it seems to be very small charge even with current 27000 recovery points. It should still be fixed tho 😄
I have also the x10 increase billing on AFT Account for Config, based on the news Recorvery Point for AWS Backup on AWS Config was activated in ...2021
I have the same issue. Bill spiked. Contacted AWS Support. I don't think they have context/understanding of AWS AFT for Terraform. He asked me to delete the recorder, which I've done via:
❯ aws configservice delete-configuration-recorder --configuration-recorder-name aws-controltower-BaselineConfigRecorder
I'm currently deleting recovery points older than 10 days using the below script:
aws backup list-recovery-points-by-backup-vault --backup-vault-name aft-controltower-backup-vault --by-resource-type "DynamoDB" --by-created-before $(date -d "-10 days" +%Y-%m-%d) | jq -r '.RecoveryPoints[].RecoveryPointArn' | xargs -I {} aws backup delete-recovery-point --backup-vault-name aft-controltower-backup-vault --recovery-point-arn {}
This didn't work for me. Something something to do with macOS
and date
. This is in a PoC environment so I just nuked all the recovery points by running:
for i in `aws backup list-recovery-points-by-backup-vault --backup-vault-name aft-controltower-backup-vault --by-resource-type "DynamoDB" | jq -r '.RecoveryPoints[].RecoveryPointArn'` ; do aws backup delete-recovery-point --backup-vault-name aft-controltower-backup-vault --recovery-point-arn $i ; done
HTH! 🙏🏼
@balltrev Any update on when this will be addressed, it's a pretty trivial fix to just put a limit on the number of backups. This would just save us the hassle of having to go in regularly and delete the old entries.
Hi all,
A quick warning on another foot-gun introduced by the AFT team.
Please make sure that when you run the above scripts to delete the recovery points, you disable AWS config recording or you will be billed per deletion API operation. Our AWS Config cost to delete the backup vault came out to $58 to delete 18,000 recovery events.
This is on top of the $20 a month for AWS config to record these DynamoDB backups.
Edit: We followed the below linked guide to stop AWS Config recording details about recovery events across all of our accounts. There are other ways to do this, but this is the way we disabled backup vault recovery points from being tracked by AWS Control Tower managed AWS Config.
By removing the AWS::Backup::RecoveryPoint
from the Cloudformation parameters it will disable recording such deletion events, avoiding the potential of a $58 lesson brought to you by the AFT team.
Here is a slightly better command with output so that you are sure its deleting things.
for i in `aws backup list-recovery-points-by-backup-vault --backup-vault-name aft-controltower-backup-vault --by-resource-type "DynamoDB" | jq -r '.RecoveryPoints[].RecoveryPointArn'` ; do echo "Deleting ${i}"; aws backup delete-recovery-point --backup-vault-name aft-controltower-backup-vault --recovery-point-arn $i ; done
We did however find it faster to use the AWS Console to delete many at a time, set rows per page to 100, click the next page, then select 100 more and so on. Then select the delete option. Doing it with the UI seems to delete 3 to 4 at a time which is much faster than the suggested script (if you need things done quickly).
@morganrowse might be helpful / valuable to add steps on how to disable AWS Config recording for AFT and subsequent steps, as reference for those who are similar situations and find this GitHub issue.
I ended up writing a script that helps me toggle the recorder on/off in the AFT management account. There is a service control policy managed by Control Tower which prevents stopping the recorder directly, so we have to temporarily detach the relevant SCP in the management account, then stop the recorder in the AFT management account, then re-attach the SCP in the management account.
#!/usr/bin/env bash
set -Eeuo pipefail
OU_NAME="Infrastructure" # Organizational Unit of the AFT management account
MANAGEMENT_ACCOUNT_PROFILE="my-management-account-aws-profile"
TARGET_ACCOUNT_PROFILE="my-aft-management-account-aws-profile"
set -x
export AWS_PROFILE="$MANAGEMENT_ACCOUNT_PROFILE"
aws sso login
root_id="$(aws organizations list-roots | jq -r '.Roots[].Id')"
ou_id="$(aws organizations list-organizational-units-for-parent --parent-id "$root_id" | jq -r ".OrganizationalUnits[] | select(.Name == \"${OU_NAME}\").Id")"
ou_policies="$(aws organizations list-policies-for-target --filter SERVICE_CONTROL_POLICY --target-id $ou_id | jq -r '.Policies[].Id')"
set +e # Needed since we are checking exit code
while IFS= read -r policy_id ; do
aws organizations describe-policy --policy-id "$policy_id" | jq -r '.Policy.Content' | grep -q "config:StopConfigurationRecorder"
if [ $? -eq 0 ]; then
sought_policy_id="$policy_id"
break
fi
done <<< "$ou_policies"
set -e
aws organizations detach-policy --policy-id "$sought_policy_id" --target-id "$ou_id"
export AWS_PROFILE="$TARGET_ACCOUNT_PROFILE"
aws sso login
recorder="$(aws configservice describe-configuration-recorders | jq -r '.ConfigurationRecorders[].name')"
is_recorder_enabled="$(aws configservice describe-configuration-recorder-status | jq '.ConfigurationRecordersStatus[].recording')"
if [ "$is_recorder_enabled" = "true" ]; then
aws configservice stop-configuration-recorder --configuration-recorder-name "$recorder"
else
aws configservice start-configuration-recorder --configuration-recorder-name "$recorder"
fi
export AWS_PROFILE="$MANAGEMENT_ACCOUNT_PROFILE"
aws sso login
aws organizations attach-policy --policy-id "$sought_policy_id" --target-id "$ou_id"
After toggling the recorder off, I was able to delete the recovery points per @morganrowse's suggestion without accumulating extra costs.
I'm currently deleting recovery points older than 10 days using the below script:
aws backup list-recovery-points-by-backup-vault --backup-vault-name aft-controltower-backup-vault --by-resource-type "DynamoDB" --by-created-before $(date -d "-10 days" +%Y-%m-%d) | jq -r '.RecoveryPoints[].RecoveryPointArn' | xargs -I {} aws backup delete-recovery-point --backup-vault-name aft-controltower-backup-vault --recovery-point-arn {}
This is very slow but it's better than nothing - I haven't found a way or API to delete in bulk and adding concurrency just makes things worse due to rate limiting (I presume).
recovery_point_arns=$(aws backup list-recovery-points-by-backup-vault --backup-vault-name aft-controltower-backup-vault --by-resource-type "DynamoDB" --by-created-before $(date -d "-20 days" +%Y-%m-%d) | jq -r '.RecoveryPoints[].RecoveryPointArn')
for recovery_point_arn in $recovery_point_arns; do
aws backup delete-recovery-point --backup-vault-name aft-controltower-backup-vault --recovery-point-arn $recovery_point_arn
done
I am running this bash under aws cloudshell terminal and it works as well, thought slow but at least deleted the Recovery Points in the Backup vault.
Our AWS config bill also spiked due to this fact. Interesting is that it also happend when we reached about 27k backups similar to @Flydiverny
Does someone had a spike in Configs costs when this amount of backups is reached?
We also have 27000 backup points. This surely must be a bug.
On Fri, 8 Dec 2023 at 15:03, Simon K @.***> wrote:
Our AWS config bill also spiked due to this fact. Interesting is that it also happend when we reached about 27k backups similar to @Flydiverny https://github.com/Flydiverny
Does someone had a spike in Configs costs when this amount of backups is reached?
— Reply to this email directly, view it on GitHub https://github.com/aws-ia/terraform-aws-control_tower_account_factory/issues/295#issuecomment-1847217930, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAA6OLZJGOF62NBBORMGZ3YIMM4ZAVCNFSM6AAAAAATZAYNMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBXGIYTOOJTGA . You are receiving this because you are subscribed to this thread.Message ID: <aws-ia/terraform-aws-control_tower_account_factory/issues/295/1847217930@ github.com>
This issue occurred for us as well, and I caught it because of a billing alert that showed a charge for ~$75. In AWS Cost Explorer we have hourly granularity, and all charges were associated with AWS Config in the AFT Management account.
In my case Configuration.VaultType
was added for all AWS Backup Recovery point resources being monitored by AWS Confg. If this project's defaults are left in place, this is 96 recovery points per day and ~35,000 per year. In us-east-2, each continuous configuration item delivered costs $0.003. This totals $105 for an attribute change to a year's worth of DynamoDB backups for this project. The recommendations in this thread to change your retention policy or backup frequency should be considered with this in mind.
Looking at the documentation history for AWS Backup, my hypothesis is that the team responsible for AWS Backup pushed a change that caught my account and caused all my historical backups to update this attribute. I hope the team responsible for this project considers adding input attributes supporting customization of the retention and frequency of backups.
In case this happens to others I'll include more details below on how I found that this new attribute was added -- I'd be interested if others saw the same in their environments and/or if other dates in the AWS Backup documentation history correlate to experiences of others.
Hi @mikeantonelli
thanks for your detailed investigation, we had the same problem the day 09/01/2024
The AWS Backup teams added this attribute to the AWS::Backup::RecoveryPoint and we had 40,200 recovery points reevaluated, since there is no retention on the recovery points and caused a spike on the billing, where some managers were really concerned about.
My suggestion is that this is fixed in ASAP, first of all by adding a retetion to the backup plan deployed by AFT in the following resource
backup.tf
resource "aws_backup_plan" "aft_controltower_backup_plan" {
name = "aft-controltower-backup-plan"
rule {
rule_name = "aft_controltower_backup_rule"
target_vault_name = aws_backup_vault.aft_controltower_backup_vault.name
schedule = "cron(0 * * * ? *)"
}
}
by adding the lifecycle attribute
resource "aws_backup_plan" "aft_controltower_backup_plan" {
name = "aft-controltower-backup-plan"
rule {
rule_name = "aft_controltower_backup_rule"
target_vault_name = aws_backup_vault.aft_controltower_backup_vault.name
schedule = "cron(0 * * * ? *)"
lifecycle {
delete_after = 14
}
}
}
and this caused a spike of 100$ for Config Rule evaluations for that day.
If the lifecycle is added to the /modules/aft-account-request-framework/backup.tf this will should cause trigger the automatic deletion of all older backups when the next AFT update comes... so this should be noted on the change log/release of the next version for AFT... and will probably cause a spike on config again.
Maybe 14 days of retention is too low, and there should be a default value on AFT for this that can be configured externally to the aft-account-request-framework so each AWS customer can configure this according to its needs.
Has this been fixed? It's a pretty scandalous bug, as it incurs high costs without warning.
Hey everyone, we've added a feature to configure Backup recovery point retention period in the latest release of AFT!
https://github.com/aws-ia/terraform-aws-control_tower_account_factory/releases/tag/1.12.0
Terraform Version & Prov:
AFT Version: 1.6.6
Terraform Version & Provider Versions Not applicable
Bug Description The configuration of the backup vault for the account hosting the vending pipeline is accruing unnecessary cost.
Config reference: aft-account-request-framework/backup.tf
This creates a backup vault aft-controltower-backup-vault and a backup plan aft_controltower_backup_rule. These are responsible for backing up the DynamoDB tables that trigger the account-vending pipeline.
The rule backs up the tables every hour however doesn't set an expiration date for the recovery points. This results in recovery points piling up in the backup vault. Additionally, this increases the items scoped in AWS Config.
Because of that AWS charges will steadily increase every single day which can grow into a beautiful bill if unnoticed.
Expected behavior The recovery points have an expiration date.
Additional context
Recovery points
Backup rule
AWS Config