aws / aws-sdk-go-v2

AWS SDK for the Go programming language.
https://aws.github.io/aws-sdk-go-v2/docs/
Apache License 2.0
2.6k stars 626 forks source link

Incorrect `A-z` range in `smithy.api#pattern` validations #2803

Open gdavison opened 4 days ago

gdavison commented 4 days ago

Acknowledgements

Describe the bug

A number of smithy.api#pattern validations in the codegen models have an incorrect A-z character range

Some examples:

https://github.com/aws/aws-sdk-go-v2/blob/e211ac0d542b6529cb7f81c05d230e5a2a6f91d6/codegen/sdk-codegen/aws-models/fsx.json#L2196-L2206

https://github.com/aws/aws-sdk-go-v2/blob/e211ac0d542b6529cb7f81c05d230e5a2a6f91d6/codegen/sdk-codegen/aws-models/fsx.json#L12375-L12384

https://github.com/aws/aws-sdk-go-v2/blob/e211ac0d542b6529cb7f81c05d230e5a2a6f91d6/codegen/sdk-codegen/aws-models/groundstation.json#L4144-L4153

Regression Issue

Expected Behavior

The range should be A-Z

Current Behavior

The range is A-z, which includes [, \, ], ^, _, and ` in addition to the upper- and lower-case letters

Reproduction Steps

Do a case-insensitive search for A-z in the repo

Possible Solution

No response

Additional Information/Context

No response

AWS Go SDK V2 Module Versions Used

Release 2024-09-24

Compiler and Version used

N/A

Operating System and version

N/A

RanVaknin commented 4 days ago

Hi @gdavison ,

Thanks for reaching out. Can you explain a little more about why you think the regex should be changed? I'm not too familiar with fsx, but at least for jsonString, special characters like [,],{ etc are expected. In other words, I need to understand why you think this regex is wrong in order to efficiently raise this to the service team on your behalf.

If I understand correctly, you are saying the regex is too "loose", meaning you are not blocked in any way, but rather trying to improve the API?

Thanks again, Ran~

gdavison commented 3 days ago

In the examples from FSx, the regexes are clearly typos, since the A-z is immediately followed or preceded by a-z. This is also the case in most of the other regexes findable by doing a case-sensitive search.

For the Ground Station JsonString, the regex already includes the characters [, ], and _. The regex also excludes a large set of valid JSON characters which can be included in string values (the JSON spec says that a string can contain Unicode characters between '0020' (space) and '10FFFF'). I'm not familiar with Ground Station, so perhaps this doesn't affect it in practice. Regardless, if the intent is to use A-z to include both upper- and lower-case letters, it should be A-Za-z instead.

This isn't a blocker for us at the moment, though I have come across a number of cases where we have created validations based on API documentation which has included this incorrect format. Since we currently create the validations manually, we have been able to catch these errors.

We are doing some planning around automatically generating code using the Smithy specifications. We will have to add code to correct the regexes, rather than relying on the specifications to be accurate.