Maintenance: appconfig _get method return type and logic

aws-powertools / powertools-lambda-python

A developer toolkit to implement Serverless best practices and increase developer velocity.

https://docs.powertools.aws.dev/lambda/python/latest/

MIT No Attribution

2.9k stars 400 forks source link

Maintenance: appconfig _get method return type and logic #1796

Closed shdq closed 1 year ago

shdq commented 1 year ago

Summary

Hello there! I hope you all are doing well

I'm working on the AppConfig provider for typescript powertools and I want to address some questions and concerns to the python team. I hope you'll help me to figure that out.

We have a conversation here about the return type of function _get in appconfig.py. As I understand return value from AppConfig should be returned as is. The same I see in the docs example. That is why I think you missed it in PR #877 please clarify this for me if I'm wrong.
I'm concerned about _get function logic. What if we need two different configuration profiles for the save application and environment?

appconf_provider = parameters.AppConfigProvider(environment="my_env", application="my_app", config=config)

def handler(event, context):
    value1: bytes = appconf_provider.get("my_conf_A")
    value2: bytes = appconf_provider.get("my_conf_B")

with the value1 we start_configuration_session and get InitialConfigurationToken which encapsulates

sdk_options["ConfigurationProfileIdentifier"] = name
sdk_options["ApplicationIdentifier"] = self.application
sdk_options["EnvironmentIdentifier"] = self.environment

We use that initial token in the subsequent API call (the request consists of the token only) to get my_conf_A configuration and NextPollConfigurationToken. After that, we save NextPollConfigurationToken in this._next_token.

Not we want to get "my_conf_B" and store it in value2. _get function check for self._next_token existence and make an API call with the existent NextPollConfigurationToken which encapsulates ConfigurationProfileIdentifier with my_conf_A in it and completely ignores name argument.

I assume we get the same values in value1 and value2. I'm not sure if is it a real-world situation, but as I understand, for the same application and environment, we can have several configuration profiles. Docs section.

There is also an unused self.current_version = "" in the class.

Other small things I noticed:

I'm not too much into python, but when I looked at the _get_multiple function I noticed that you raise NotImplementedError() in derived class and the docs say that It should not be used to indicate that an operator or method is not meant to be supported at all – in that case either leave the operator / method undefined or, if a subclass, set it to None. Maybe it's not an issue – let me know!
In the docs example, I think the comment was copy-pasted from the secrets section, maybe you should consider it to change to # Retrieve the latest configuration.

Why is this needed?

To provide consistency across implementations in different languages.

Which area does this relate to?

Parameters, Typing

Solution

A possible solution for the return type:

In the appconfig.py Current return type: def _get(self, name: str, **sdk_options) -> str: Suggested return type: def _get(self, name: str, **sdk_options) -> bytes:

A possible solution for the method logic:

Store name from the previous call along with NextPollConfigurationToken and flush self._next_token and self.last_returned_value if name is different from the previous call.

Another option is to use dict to store NextPollConfigurationToken tokens with name as the key. But it increases complexity, cause in this case you should also store last_returned_value for each token. I think flush and starting a new session is the better option.

All best, Sergei.

Acknowledgment

[X] This request meets Lambda Powertools Tenets
[X] Should this be considered in other Lambda Powertools languages? i.e. Java, TypeScript

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #python channel on our AWS Lambda Powertools Discord: Invite link

dreamorosi commented 1 year ago

For the record, I think we can skip points 1 and 2 above. We now have a better understanding (which we didn't at the time) of how AppConfig handles repeated calls and have already made our considerations on the implementation and retry logic.

Point 3 is still valid more than a question is a set of suggestions that you can consider.

I would however like to piggyback on this question to ask a followup one since the title still applies to it.

Let's start with a piece of code:

from aws_lambda_powertools.utilities import parameters

def handler():
    parameter = parameters.get_app_config(
        name="test-config-profile-plain-text",
        environment="test-env",
        application="test-app",
    )
    print(parameter)
    print(type(parameter))

This function uses the Parameters utility to retrieve a freeform plaintext configuration from AppConfig without applying any transform to it. It then prints the parameter and its type which is bytes.

b'test'
<class 'bytes'>

Now, let's say that I want to use Parameters to also apply a transformation to the config. Given its type (bytes) I'm inclined to use transform="binary" like so:

    parameter = parameters.get_app_config(
        name="test-config-profile-plain-text",
        environment="test-env",
        application="test-app",
+      transform="binary"
    )

After doing this I would expect the parameter to be returned as a string, however the same function now prints the logs below, which suggest the original value is being base64 encoded/decoded and still returned as of bytes type:

b'\xb5\xeb-'
<class 'bytes'>

Now, my question are:

Is this a bug or am I misunderstanding the binary transform?
If it's not a bug, should the transform name really be base64 and not binary?
How should I handle a binary that is not base64 encoded?

As a side note, in TypeScript the SDK returns Uint8Array. When I initially implemented the BaseProvider I wrote the logic of the transformer to transparently handle Uint8Array+base64 and Uint8Array under the binary transform, however I'm now doubting it's the right approach.

Thanks in advance.

PS: if you want to deploy the parameter that I used in the example above, you can sue the CDK template below:

```ts import * as cdk from 'aws-cdk-lib'; import { Construct } from 'constructs'; import { CfnApplication, CfnConfigurationProfile, CfnDeployment, CfnDeploymentStrategy, CfnEnvironment, CfnHostedConfigurationVersion, } from "aws-cdk-lib/aws-appconfig"; export class AppconfigCdkStack extends cdk.Stack { constructor(scope: Construct, id: string, props?: cdk.StackProps) { super(scope, id, props); // create a new app config application. const application = new CfnApplication( this, "application", { name: "test-app", } ); const environment = new CfnEnvironment(this, "environment", { name: "test-env", applicationId: application.ref, }); const immediateDeploymentStrategy = new CfnDeploymentStrategy(this, "deploymentStrategy", { name: "ImmediateDeployment", deploymentDurationInMinutes: 0, growthFactor: 100, replicateTo: "NONE", finalBakeTimeInMinutes: 0, }); // Freeform plain text configuration profile const configProfilePlainText = new CfnConfigurationProfile(this, "configProfilePlainText", { name: "test-config-profile-plain-text", applicationId: application.ref, locationUri: "hosted", type: "AWS.Freeform", }); const configVersionPlainText = new CfnHostedConfigurationVersion(this, "configVersionPlainText", { applicationId: application.ref, configurationProfileId: configProfilePlainText.ref, content: "test", contentType: "text/plain", }); new CfnDeployment(this, "deploymentPlainText", { applicationId: application.ref, configurationProfileId: configProfilePlainText.ref, configurationVersion: configVersionPlainText.ref, deploymentStrategyId: immediateDeploymentStrategy.ref, environmentId: environment.ref, }); } } ```

leandrodamascena commented 1 year ago

Hi @shdq, thanks for bringing up these questions! Following @dreamorosi recommendation, I'll answer question 3.

Other small things I noticed:

I'm not too much into python, but when I looked at the _get_multiple function I noticed that you raise NotImplementedError() in derived class and the docs say that It should not be used to indicate that an operator or method is not meant to be supported at all – in that case either leave the operator / method undefined or, if a subclass, set it to None. Maybe it's not an issue – let me know!

We have an abstract class called BaseProvider that defines a blueprint for all subclasses that extends from it. This ensures that subclasses follow the structure and implement all abstract methods described in this abstract class, and _get_multiple is one of those methods.

Currently, the feature to get multiple parameters by the path only exists in the SSM and DynamoDB providers. You'll see it in secrets and appconfig providers that the _get_multiple method is not implemented and raise NotImplementedError().

In the docs example, I think the comment was copy-pasted from the secrets section, maybe you should consider it to change to # Retrieve the latest configuration.

I couldn't find this, could you help me point out where I can find this error and fix it?

Thank you so much and let me know if you still have some questions.

leandrodamascena commented 1 year ago

Point 3 is still valid more than a question is a set of suggestions that you can consider.

I answered this question in the previous comment.

from aws_lambda_powertools.utilities import parameters

def handler():
    parameter = parameters.get_app_config(
        name="test-config-profile-plain-text",
        environment="test-env",
        application="test-app",
    )
    print(parameter)
    print(type(parameter))
This function uses the Parameters utility to retrieve a freeform plaintext configuration from AppConfig without applying any transform to it. It then prints the parameter and its type which is bytes.
b'test'
<class 'bytes'>
Until here it is expected because the SDK returns a StreamingBody and we just read it and pass it on.

Now, let's say that I want to use Parameters to also apply a transformation to the config. Given its type (bytes) I'm inclined to use transform="binary" like so:
    parameter = parameters.get_app_config(
        name="test-config-profile-plain-text",
        environment="test-env",
        application="test-app",
+      transform="binary"
    )
After doing this I would expect the parameter to be returned as a string, however the same function now prints the logs below, which suggest the original value is being base64 encoded/decoded and still returned as of bytes type:
b'\xb5\xeb-'
<class 'bytes'>

What you are seeing here is no longer the value stored in AWS AppConfig, this is the base64decode value of the original value and I think this is the confusion surrounding this question. Let me try to create an explanation for this.

The transform parameter helps clients to transform the stored parameter value into a JSON (as long as it is in key-value format) or Base64 decoded string.
In your example, you stored a plain text string in AppConfig Freeform and now you have a byte class with the value b'\xb5\xeb-', but no matter if you were using other providers as ssm, secrets or DynamoDB, you would have the same behavior because when using transform="binary" we always call base64.b64decode function, regardless of whether it is a base64 string or plain text. The code below exemplifies the same behavior using the ssm provider:

from aws_lambda_powertools.utilities import parameters

def lambda_handler(event, context):

    value = parameters.get_parameter("test-issue-return", transform="binary")

    print(value)
    print(type(value))

The problem (which may not be a problem) here is how Python works with base64. If we check the Python source code for the b64decode function, we see that if the provided value is a string it uses the function str.encode to turn the string into a byte and then execute binascii.a2b_base64 to convert a block of base64 data back to binary and return the binary data. This is what I think our understanding of this problem is. Python takes a non-base64 string and encodes it and returns it. I need to go a little deeper into this to bring more insights.
And besides all that, the only reason this test "worked" was because you created a string with 4 characters, following the base64 RFC a string to be considered a base64 valid encoding has to be a multiple of 4 and that's why the base64.b64decode function did not make an exception. Try modifying it to 5 characters and you will see that it will give an error.

Now, my question are:

Is this a bug or am I misunderstanding the binary transform?

I'm not sure if we can consider this as a bug as this is how the Python implementation handles it. I will put this as a point of attention to review. Maybe update the documentation explaining this, I'm not sure yet.

If it's not a bug, should the transform name really be base64 and not binary?

In my opinion, this shouldn't be renamed to base64, after all, base64 is binary data in ASCII string format. Also, if we consider renaming this, it should be treated as a breaking change in future releases.

How should I handle a binary that is not base64 encoded?

I didn't understand this question. Can you help me clear this up a bit?

As a side note, in TypeScript the SDK returns Uint8Array. When I initially implemented the BaseProvider I wrote the logic of the transformer to transparently handle Uint8Array+base64 and Uint8Array under the binary transform, however I'm now doubting it's the right approach.

Thanks in advance.

PS: if you want to deploy the parameter that I used in the example above, you can sue the CDK template below:

I'll take a look at the TypeScript code to understand how you're handling this.

I hope I clarified how this works in Python and how we can improve code and/or documentation to make this even clearer for our customers. Any questions please let me know Andrea.

Thank you very much.

shdq commented 1 year ago

@leandrodamascena thank you for the answers!

Currently, the feature to get multiple parameters by the path only exists in the SSM and DynamoDB providers. You'll see it in secrets and appconfig providers that the _get_multiple method is not implemented and raise NotImplementedError().

Yeah, I'm aware that the providers do not support these methods. I just pointed out that python docs say NotImplementedError() is not intended for non-existing methods of the base class and None should be used.

I couldn't find this, could you help me point out where I can find this error and fix it?

I checked again and I think you fixed that in #1564.

Speaking of transformation, from my point of view it is a bit confusing, after reading the docs I expected that the transform argument should be the desired type. But it is my perception.

Have a nice day!

dreamorosi commented 1 year ago

Thank you Leandro for the lengthy answer, appreciate you explaining this.

I agree with you that the behavior works as intended and the naming is also correct (also I got lucky with the 4 characters string!).

Let's take this other example:

const value = toBase64(new TextEncoder().encode('test'));

const configVersionPlainText = new CfnHostedConfigurationVersion(this, "configVersionPlainText", {
  applicationId: application.ref,
  configurationProfileId: configProfilePlainText.ref,
  content: value,
  contentType: "text/plain",
});

This is a string (test) that was base64 encoded (dGVzdA==), then stored as plain text on AppConfig. If I run the same function as above with transform="binary", the result will be this:

b'test'
<class 'bytes'>

I expected this to be decoded from base64 (good), but I'm confused as to why it's not a string and it's still bytes.

PS: I think the TS implementation of the transform is wrong, or at least it doesn't behave the same :/

leandrodamascena commented 1 year ago

Hi @dreamorosi! When using AppConfig Freeform you can choose multiple source providers to store/get your configuration including S3 and SSM Document, you can even store a jpeg/zip/others in S3 and use it as a value for your parameter. Even the client using the transform parameter to unpack the base64 stored in AppConfig provider, we return it in bytes because we don't know all the clients' use cases and a possible decode of bytes to string could break this.

I agree with you that this causes some confusion and we can take this opportunity to make our documentation more explicit to explain this, but in this way, we avoid any issues and try to cover all use cases that the user might have.

Here's an example where we use AppConfig FreeForm to store an S3 object (a jpg image) and force a byte-to-string decode, we get an error because Python can't decode it. Maybe that doesn't happen in Typescript because of using Uint8Array (I saw the SDK documentation and its implementation).

I believe this behavior of having the return in bytes is very specific to AppConfig, the other providers (SSM, DynamoDB, Secrets) make sense to return string.

Hope this helped you understand this behavior.

leandrodamascena commented 1 year ago

Speaking of transformation, from my point of view it is a bit confusing, after reading the docs I expected that the transform argument should be the desired type. But it is my perception.

Thank you for this feedback @shdq! We will work to improve our documentation to make it more clear.

Have a great week!

dreamorosi commented 1 year ago

I see, you are right in that we can't make assumptions on what is going to be returned.

Everything is clear now, and I'm sure I need to go back to the drawing board with our AppConfig provider to align the transform behavior to the one in Python.

We can close the issue whenever you want.

Thanks a lot!

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.