Generate AWS ARNs & other AWS data

douglasg14b commented 1 month ago

Clear and concise description of the problem

Generating fake AWS arns is reasonably common when testing code that works with & interprets ARNs.

The process of generating an ARN generally means we will be enumerating many AWS constants, and maybe should be rolled into an entire AWS module.

Spec:

arn:partition:service:region:account-id:resource-id
arn:partition:service:region:account-id:resource-type/resource-id
arn:partition:service:region:account-id:resource-type:resource-id

API:

faker.string.arn() or:

// Taking advantage of all the other generators needed in order to produce an ARN
faker.aws.
    accountId()
    arn()
    arnFormat()
    partition()
    region()
    resourceId()
    resourceType()
    service()

More Research

A full range of acceptable values and formats appears to be available by parsing a few sources, making is so generated ARNs could be precisely accurate to their real world counterparts. Which could be supported largely by codegen.

The CloudFormation resource specification for us-east-1 at https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-resource-specification.html
- It looks like resource available is based on region, however that may be too granular for such a generaotr
The policies.js file at: https://awspolicygen.s3.amazonaws.com/js/policies.js

Alternative

¯\_(ツ)_/¯

Additional context

No response

github-actions[bot] commented 1 month ago

Thank you for your feature proposal.

We marked it as "waiting for user interest" for now to gather some feedback from our community:

If you would like to see this feature be implemented, please react to the description with an up-vote (:+1:).
If you have a suggestion or want to point out some special cases that need to be considered, please leave a comment, so we are aware about them.

We would also like to hear about other community members' use cases for the feature to give us a better understanding of their potential implicit or explicit requirements.

We will start the implementation based on:
the number of votes (:+1:) and comments
the relevance for the ecosystem
availability of alternatives and workarounds
and the complexity of the requested feature

We do this because:
There are plenty of languages/countries out there and we would like to ensure that every method can cover all or almost all of them.
Every feature we add to faker has "costs" associated to it:
- initial costs: design, implementation, reviews, documentation
- running costs: awareness of the feature itself, more complex module structure, increased bundle size, more work during refactors
View more issues which are waiting for user interest

ST-DDT commented 1 month ago

// Taking advantage of all the other generators needed in order to produce an ARN

Could you please provide a pseudo implementation of the arn method, so that it becomes clearer what each of the other methods are supposed to do?

douglasg14b commented 2 weeks ago

What is the standard for "correctness". Should arns be a close to valid as possible, or is "looks like an ARN" good enough?

I ask because as I dug into this I found that arns are complicated. Different resources can only be used with a particular service, some services use one format or the other, some regions do not have services and therefore cannot have ARNs for certain resources with that region...etc

Thankfully this can be parsed out & constructed dynamically since the data to make these decisions is published by AWS. However, the surface area for ARN construction may be more work than I had initially anticipated. Nevermind the effort of building out parsing mechanism to turn AWS published resources into valid ARN generation 🤔

The level of correctness required influences how difficult this will be.

ST-DDT commented 2 weeks ago

What is the standard for "correctness".

What is required for your usecase? It should be correct enough to work for your and simple general usecases with the option to go to "close to valid as possible" without breaking api changes later.

What is your usecase?

matthewmayer commented 2 weeks ago

In general I think "plausible" is acceptable rather than "valid". We have plenty of existing methods that generate plausible but sometimes invalid data because checking for validity would require much more complex code or shipping large amount of data, eg ZIP codes and IBAN codes.

faker-js / faker