ekristen / aws-nuke

Remove all the resources from an AWS account
https://ekristen.github.io/aws-nuke/
MIT License
271 stars 27 forks source link

Review Resources for Tag Support #118

Closed ekristen closed 7 months ago

ekristen commented 8 months ago
SageMakerUserProfiles
ECSCluster
ServiceDiscoveryNamespace
SQSQueue
CognitoUserPoolDomain
CognitoUserPoolClient
CognitoUserPool
CognitoIdentityProvider
ElasticacheSubnetGroup
ElasticacheCacheCluster

Originally posted by @YuriGal in https://github.com/ekristen/aws-nuke/issues/110#issuecomment-1988742975

ekristen commented 8 months ago

@YuriGal I've started the work with #123, I'm adding tests as I go, especially since some of these require additional API calls to sort out. If you want to test I can cut builds via GitHub Actions, but I can't do beta release at the moment. I need to do a bit more testing.

ekristen commented 8 months ago

Many of these resources have to make additional API calls to get the tags. I'm thinking of adding a setting for each resource to turn tag fetching on/off for these resources. What do you think about that?

I have to make a decision on whether not to always fetch tags or not especially when it requires additional api calls, which each of these do. Thoughts?

YuriGal commented 8 months ago

It should probably always fetch tags, we don't know in advance which resources will be tagged or not.

ekristen commented 8 months ago

Probably true. I'm almost done with the updates. Getting tests in place as well at the same time, matters less for simple removes, but it's nice to have.

Should be dropping out of beta soon. I haven't had any reports of issues.

dupuyarc commented 7 months ago

I'm very happy to see this issue already exists, as I was disappointed to see that there are quite a lot of resource types where tags are not fetched. By my very rough count (grep'ing for tag in the resources/ folder) only 127 out of 471 resource types get tags, which is both better (total number) and worse (percentage) than the original rebuy.de version's 117 out of 428.

I would like to suggest that there may be a less time-consuming way to get tags for resource types that don't currently support this. The AWS Resource Groups Tagging API supports many, but not all, AWS services and it's not too hard to get tags for all resources using that API and it would then be possible to merge them into the properties for any resource.

If there's any interest in this, I'd be glad to contribute at least some initial code.

ekristen commented 7 months ago

@dupuyarc interesting, thanks for the reference I will take a look. Initial glance indicates the resources might have to belong to a resource group for this to to work, and the API doesn't appear to support querying tags for specific resources instead all tags that are on a resource.

I'm coming back around to this this week to add more tags to some resources. It might prove easier to simple knock off one set of resources at a time adding in tags where they are supported.

I do want to say though that Contributions are welcome!

If it's more than adding a resource or bug fixing, we likely need to do a proof of concept and a bit of a design discussion to make sure it fits into the current model OR figure out how to change the current model to work. I would say anything that merges tags in after the fact needs to be discuss in this light. The way the tool works with concurrency, resource querying, filtering, etc all need to be taken into consideration.

Ultimately my desire is to have this project to be more easily contributed to with less gatekeeping and more progress.

dupuyarc commented 7 months ago

Based on my initial testing with our AWS sandbox account, for supported resource types it returns the tags (or lack of tags) for any resource in a region + account that has ever been tagged, regardless of its membership in any Resource Group.

Your points about concurrency and merging are well taken, and there would need to be additional tests to cover the new functionality, so it's clearly not a simple "drop-in" change.

The nature of the API (which is not specific to the AWS service the resources belong to) does provide some future-proofing – since it could pick up tags for any resource type that currently doesn't support tags at some possible future date when tagging support is added to that resource type.

It is possible to use the API on the level of a specific individual resource, using the ResourceARNList in the request, but that is an inefficient way to use this API, and is much more likely to hit API rate limits.

One way to use this API in aws-nuke could be with a goroutine paginating through the API and pulling all resources and their tags and building up a map from resource ARNs to the associated tags. The regular resource processing code might do lookups in that map (maybe only for services/resource types that the Resource Group API supports, and/or only if the resource type did not have code to get tags using its service-specific API), blocking on completion of the Resource Group API fetching goroutine if there was no data for the ARN (yet).

Ideally the additional code to support that could be located entirely outside the resource-specific code, apart maybe from a minimal boolean function indicating whether the resource type has code to fetch tags using the resource's service API.

As for merging the tags from the service-specific API and the Resource Group API, I'd guess the best approach would be to use one or the other for any given resource type, and have that choice determined by a flag. The default would be the current (service-specific API) method, and the alternative (selected by an --all-tags flag?) would be to use the Resource Group API for any resource type supported by that API.

It's also possible to filter the resources returned by the API by resource types or tags – these could be used for efficient implementation of some aws-nuke filters, but that seems like a separate enhancement, apart from improving the reporting of tag values across more resource types.

ekristen commented 7 months ago

Are you just using the GetResources API in your testing?

The nature of the API (which is not specific to the AWS service the resources belong to) does provide some future-proofing – since it could pick up tags for any resource type that currently doesn't support tags at some possible future date when tagging support is added to that resource type.

Good to know.

Ideally the additional code to support that could be located entirely outside the resource-specific code, apart maybe from a minimal boolean function indicating whether the resource type has code to fetch tags using the resource's service API.

This is where I get concerned, but worth looking into more. The queuing and iterating mechanism is designed to be self contained.

One possible approach, although I'd need to understand the data format would be to "request" all tags ahead of time, and then introduce a new Interface handler called Tags or something that is called before any others like Filter or Properties, it could then mutate the Properties if it finds tags.

Filtering and keeping API calls within reason are the two primary concerns.

I see two options:

  1. simply add tag queries to each resource as needed, this adds +1 query per resource found.
  2. modify libnuke resource to support something like Tags handler, and make a separate query per region to get all tags and then do some sort of list and find to match resources.

I think it's worth exploring, in the short term it's easier to do 1. 2 Might be better, but there's definitely some unknowns around memory usage, adding in API calls per region completely unbound from a resource type or resource that we then have to make available to all resources to discover.

Another roadblock is that not every single resource exposes it's ARN in an accessible way, this could make option 2 more difficult as well.

ekristen commented 7 months ago

A use-case that came to mind that could be a problem (have to assume someone is doing this I think?)

If someone is using a limited permission credentials to do the nuking of specific resources, like ec2 only. This would require a separate set of privileges, and if they only have access to ec2 for example, they wouldn't be able to use this API, but they would be able to use the ec2 get tags.

dupuyarc commented 7 months ago

If someone is using a limited permission credentials to do the nuking of specific resources, like ec2 only. This would require a separate set of privileges, and if they only have access to ec2 for example, they wouldn't be able to use this API, but they would be able to use the ec2 get tags.

This is a perfectly reasonable scenario, although I don't know how common it would be. I had been thinking that using the resource group tagging API would be a non-default feature (just in case there are subtle differences in the reporting of tags between that API and the specific service APIs), so somebody using restricted credentials would not be affected unless they explicitly enabled the feature.

In that case, they would need to extend the capabilities of the credentials to use the (read-only parts of) the resource group tagging API. But that seems like it wouldn't be too much of a problem in most cases. Only in an environment where there is sensitive data in the resource ARNs and tags would that be an issue, I think.

ekristen commented 7 months ago

I love this discussion and the idea in general, definitely some implications to work through.

With v3, there is now true feature flag system that can be used for something like this, it was meant more for experimental features working to become mainstream, but doesn't necessarily have to be it could be something like this long term. With it not being a default feature, what benefit would it give vs just implementing GetTags calls for every resource over time?

I definitely need to think about this more. On the one hand I like the idea of probably reducing API calls to get the information needed and mutating the resource properties at the right point as it would potentially be easier to grab all tagging data for all resources, on the other hand, if you are doing targeted runs (which I know people do), you end up pulling data you don't need (unsure the quantity). If the AWS accounts are HUGE this could end up taking a lot of memory for no reason. Imagine you are targeting 10 resource types, but the account has 100 in use over 1000s of resources, you potentially end up pulling a lot of tagging data you don't need.

dupuyarc commented 7 months ago

Are you just using the GetResources API in your testing?

I just used some basic Python code that pulls the tag data from all resources in a given region of the account. I generated it with ChatGPT! – hey why not? I had to re-prompt it to add pagination support, and I edited the two versions together to get this (probably plagiarized from AWS documentation or somebody's blog post) code, that amazingly enough worked fine (modulo pagination) on the first try:

#!/usr/bin/env python3

import boto3

def get_tags_for_resources(region):
    # Create a boto3 client for the Resource Groups Tagging API
    client = boto3.client('resourcegroupstaggingapi', region_name=region)

    # Paginate through results to retrieve all resources with their tags
    resources = []
    try:
        paginator = client.get_paginator('get_resources')
        for page in paginator.paginate():
            resources.extend(page['ResourceTagMappingList'])
    except client.exceptions.ClientError as e:
        print(f"Error: {e.response['Error']['Message']}")

    return resources

def main():
    # Specify the AWS region
    region = 'us-east-1'  # Change this to your desired region

    # Get tags for resources in the specified region
    resources = get_tags_for_resources(region)

    # Print tags for each resource
    for resource in resources:
        print("Resource ARN:", resource['ResourceARN'])
        tags = resource.get('Tags', [])
        if tags:
            print("Tags:")
            for tag in tags:
                print(f"\t{tag['Key']}: {tag['Value']}")
        else:
            print("No tags found for this resource.")
        print()

if __name__ == "__main__":
    main()

It generates output that looks something like this (running in an account where these resources are not part of any resource group):

Resource ARN: arn:aws:cloudformation:us-east-1:123456789012:stack/aws-sam-cli-managed-default/70c73930-bc89-11ee-8247-a1234567890b
Tags:
        ManagedStackSource: AwsSamCli

Resource ARN: arn:aws:cloudformation:us-east-1:123456789012:stackset/AWS-QuickSetup-SSMHostMgmt-LA-c4k78:cfcc53ec-d852-4b6b-b73e-a1234567890b
Tags:
        QuickSetupType: Host Management
        QuickSetupVersion: 3.1
        QuickSetupID: c4k78

Resource ARN: arn:aws:cloudwatch:us-east-1:123456789012:alarm:TargetTracking-table/123456789012-remote-state-us-east-1-ProvisionedCapacityLow-2cee7860-3554-4ed9-9048-a1234567890b
No tags found for this resource.

It would be interesting to see how well ChatGPT does with generating Go source. I would not expect success on first try, as there is surely much less training data with Go, and LLM tokenizers have been tweaked to support Python's syntactic indentation, plus Go would be more likely to throw a compiler error due to missing punctuation or other subtle details ChatGPT might miss. Generally I find ChatGPT more useful as a way to find a library or class I might not have been aware of, or wouldn't have thought to use, and I would never use it to generate "real" code. But it was quite convenient in this case to kick the tires on an API.

dupuyarc commented 7 months ago

I love this discussion and the idea in general, definitely some implications to work through.

Agreed! (and very happy to have stumbled on your fork – sadly on the second page of search results, but I guess it keeps you from having to deal with 400+ open issues on the repo – the perils of success?).

With it not being a default feature, what benefit would it give vs just implementing GetTags calls for every resource over time?

As you note, there could be a performance advantage in using the resource groups tagging API due to fewer calls. If a configuration isn't using tags to filter resources (and I would guess this is a majority of use cases) the extra GetTags calls for every resource would slow aws-nuke down unnecessarily (this could be an argument for disabling those calls with a --no-tags option).

But in terms of features/functionality, the future-proofing wouldn't be that significant, except in the near term, as implementing GetTags for every of the hundreds(?) of resource types that support it would take some time. The more significant benefit, though still modest, is simply reduction in code and maintenance burden over time.

... if you are doing targeted runs (which I know people do), you end up pulling data you don't need (unsure the quantity). If the AWS accounts are HUGE this could end up taking a lot of memory for no reason. Imagine you are targeting 10 resource types, but the account has 100 in use over 1000s of resources, you potentially end up pulling a lot of tagging data you don't need.

If you are targeting specific resource types with --target, that can be incorporated into the GetResources request using ResourceTypeFilters. Using those filters would require a comprehensive mapping from the aws-nuke resource type names to AWS service:resource-type names, but some mappings might be necessary in any case for ARN generation.

It's also worth remembering that only resources that are (or have been) tagged are visible in this API, so S3Objects and DynamoDBTableItems and similar very numerous resources are always out of scope.

dupuyarc commented 7 months ago

Filtering and keeping API calls within reason are the two primary concerns.

I would also suggest that overall code size is a consideration too. Although the extra code for each resource type is not very much, when you add in test cases that multiplies it a bit and then add that over hundreds(?) of resource types that are taggable but don't have the code...

I see two options:

  1. simply add tag queries to each resource as needed, this adds +1 query per resource found.
  2. modify libnuke resource to support something like Tags handler, and make a separate query per region to get all tags and then do some sort of list and find to match resources.

I think it's worth exploring, in the short term it's easier to do 1. 2 Might be better, but there's definitely some unknowns around memory usage, adding in API calls per region completely unbound from a resource type or resource that we then have to make available to all resources to discover.

This is a very good point. Especially if people only care about tags on certain resource types where they are filtering them, pulling all the tags could get a bit heavy, even if the API calls are kept within reason. You wrote in an earlier comment above

I'm thinking of adding a setting for each resource to turn tag fetching on/off for these resources. What do you think about that?

If this is a new field in the configuration file, I think it is an excellent idea, and it would work with either individual resource type code or the resource groups API filters.

ekristen commented 7 months ago

I slept on this a bit. I think it's worth exploring but will likely require some amount of change to the core libnuke library to allow for collecting and setting tags outside of the standard resource listing process. I don't know yet what this needs to look like.

Another concern is "too much magic", if we have this thing that sits exterior to the standard resource listing that just makes resources magically appear that would be a problem.

I've been working on new feature that allows the properties to be defined on the resource struct and automatically generate the properties from the struct. Been doing testing and benchmarking to see how it performs, ideally I'd like to move more knowledge into the struct so that we can do things like programmatically generate documentation as to what Properties are supported for what Resource. This is something I'm working on right now. https://github.com/ekristen/libnuke/pull/45

Anyways, I need to play with the API a bit and with the concepts. libnuke makes this possible though much easier than the original upstream. There is now context and options passed into each listing function which allows us to pass settings, or other data which would have previously had to be global or not at all.

FYSA - Small bit of docs around Settings - it's the successor to the original feature flags and feature flags has been re-purposes for actual CLI level code changes. Settings is how you change the behavior of a resource. (i.e. disabling termination production on ec2 or rds)

ekristen commented 7 months ago

Lets move all future comms to https://github.com/ekristen/aws-nuke/issues/134 -- this issue will close once I add tags for these resources.

ekristen commented 7 months ago

:tada: This issue has been resolved in version 3.0.0-beta.46 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

ekristen commented 5 months ago

:tada: This issue has been resolved in version 3.0.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: