Closed shdq closed 1 year ago
Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #python channel on our AWS Lambda Powertools Discord: Invite link
For the record, I think we can skip points 1 and 2 above. We now have a better understanding (which we didn't at the time) of how AppConfig handles repeated calls and have already made our considerations on the implementation and retry logic.
Point 3 is still valid more than a question is a set of suggestions that you can consider.
I would however like to piggyback on this question to ask a followup one since the title still applies to it.
Let's start with a piece of code:
from aws_lambda_powertools.utilities import parameters
def handler():
parameter = parameters.get_app_config(
name="test-config-profile-plain-text",
environment="test-env",
application="test-app",
)
print(parameter)
print(type(parameter))
This function uses the Parameters utility to retrieve a freeform plaintext configuration from AppConfig without applying any transform to it. It then prints the parameter and its type which is bytes
.
b'test'
<class 'bytes'>
Now, let's say that I want to use Parameters to also apply a transformation to the config. Given its type (bytes
) I'm inclined to use transform="binary"
like so:
parameter = parameters.get_app_config(
name="test-config-profile-plain-text",
environment="test-env",
application="test-app",
+ transform="binary"
)
After doing this I would expect the parameter to be returned as a string, however the same function now prints the logs below, which suggest the original value is being base64 encoded/decoded and still returned as of bytes
type:
b'\xb5\xeb-'
<class 'bytes'>
Now, my question are:
binary
transform?base64
and not binary
?As a side note, in TypeScript the SDK returns Uint8Array
. When I initially implemented the BaseProvider
I wrote the logic of the transformer to transparently handle Uint8Array
+base64 and Uint8Array
under the binary
transform, however I'm now doubting it's the right approach.
Thanks in advance.
PS: if you want to deploy the parameter that I used in the example above, you can sue the CDK template below:
Hi @shdq, thanks for bringing up these questions! Following @dreamorosi recommendation, I'll answer question 3.
- Other small things I noticed:
- I'm not too much into python, but when I looked at the
_get_multiple
function I noticed that you raise NotImplementedError() in derived class and the docs say thatIt should not be used to indicate that an operator or method is not meant to be supported at all – in that case either leave the operator / method undefined or, if a subclass, set it to None.
Maybe it's not an issue – let me know!
We have an abstract class called BaseProvider that defines a blueprint for all subclasses that extends from it. This ensures that subclasses follow the structure and implement all abstract methods described in this abstract class, and _get_multiple
is one of those methods.
Currently, the feature to get multiple parameters by the path only exists in the SSM and DynamoDB providers. You'll see it in secrets and appconfig providers that the _get_multiple method is not implemented and raise NotImplementedError()
.
- In the docs example, I think the comment was copy-pasted from the secrets section, maybe you should consider it to change to
# Retrieve the latest configuration
.
I couldn't find this, could you help me point out where I can find this error and fix it?
Thank you so much and let me know if you still have some questions.
Point 3 is still valid more than a question is a set of suggestions that you can consider.
I answered this question in the previous comment.
from aws_lambda_powertools.utilities import parameters def handler(): parameter = parameters.get_app_config( name="test-config-profile-plain-text", environment="test-env", application="test-app", ) print(parameter) print(type(parameter))
This function uses the Parameters utility to retrieve a freeform plaintext configuration from AppConfig without applying any transform to it. It then prints the parameter and its type which is
bytes
.b'test' <class 'bytes'>
Until here it is expected because the SDK returns a StreamingBody and we just read it and pass it on.
Now, let's say that I want to use Parameters to also apply a transformation to the config. Given its type (
bytes
) I'm inclined to usetransform="binary"
like so:parameter = parameters.get_app_config( name="test-config-profile-plain-text", environment="test-env", application="test-app", + transform="binary" )
After doing this I would expect the parameter to be returned as a string, however the same function now prints the logs below, which suggest the original value is being base64 encoded/decoded and still returned as of
bytes
type:b'\xb5\xeb-' <class 'bytes'>
What you are seeing here is no longer the value stored in AWS AppConfig, this is the base64decode value of the original value and I think this is the confusion surrounding this question. Let me try to create an explanation for this.
transform
parameter helps clients to transform the stored parameter value into a JSON (as long as it is in key-value format) or Base64 decoded string.b'\xb5\xeb-'
, but no matter if you were using other providers as ssm, secrets or DynamoDB, you would have the same behavior because when using transform="binary"
we always call base64.b64decode function, regardless of whether it is a base64 string or plain text. The code below exemplifies the same behavior using the ssm provider:from aws_lambda_powertools.utilities import parameters
def lambda_handler(event, context):
value = parameters.get_parameter("test-issue-return", transform="binary")
print(value)
print(type(value))
The problem (which may not be a problem) here is how Python works with base64. If we check the Python source code for the b64decode function, we see that if the provided value is a string it uses the function str.encode to turn the string into a byte and then execute binascii.a2b_base64 to convert a block of base64 data back to binary and return the binary data. This is what I think our understanding of this problem is. Python takes a non-base64 string and encodes it and returns it. I need to go a little deeper into this to bring more insights.
And besides all that, the only reason this test "worked" was because you created a string with 4 characters, following the base64 RFC a string to be considered a base64 valid encoding has to be a multiple of 4 and that's why the base64.b64decode function did not make an exception. Try modifying it to 5 characters and you will see that it will give an error.
Now, my question are:
- Is this a bug or am I misunderstanding the
binary
transform?
I'm not sure if we can consider this as a bug as this is how the Python implementation handles it. I will put this as a point of attention to review. Maybe update the documentation explaining this, I'm not sure yet.
- If it's not a bug, should the transform name really be
base64
and notbinary
?
In my opinion, this shouldn't be renamed to base64, after all, base64 is binary data in ASCII string format. Also, if we consider renaming this, it should be treated as a breaking change in future releases.
- How should I handle a binary that is not base64 encoded?
I didn't understand this question. Can you help me clear this up a bit?
As a side note, in TypeScript the SDK returns
Uint8Array
. When I initially implemented theBaseProvider
I wrote the logic of the transformer to transparently handleUint8Array
+base64 andUint8Array
under thebinary
transform, however I'm now doubting it's the right approach.Thanks in advance.
PS: if you want to deploy the parameter that I used in the example above, you can sue the CDK template below:
I'll take a look at the TypeScript code to understand how you're handling this.
I hope I clarified how this works in Python and how we can improve code and/or documentation to make this even clearer for our customers. Any questions please let me know Andrea.
Thank you very much.
@leandrodamascena thank you for the answers!
Currently, the feature to get multiple parameters by the path only exists in the SSM and DynamoDB providers. You'll see it in secrets and appconfig providers that the _get_multiple method is not implemented and raise
NotImplementedError()
.
Yeah, I'm aware that the providers do not support these methods. I just pointed out that python docs say NotImplementedError()
is not intended for non-existing methods of the base class and None
should be used.
I couldn't find this, could you help me point out where I can find this error and fix it?
I checked again and I think you fixed that in #1564.
Speaking of transformation, from my point of view it is a bit confusing, after reading the docs I expected that the transform
argument should be the desired type. But it is my perception.
Have a nice day!
Thank you Leandro for the lengthy answer, appreciate you explaining this.
I agree with you that the behavior works as intended and the naming is also correct (also I got lucky with the 4 characters string!).
Let's take this other example:
const value = toBase64(new TextEncoder().encode('test'));
const configVersionPlainText = new CfnHostedConfigurationVersion(this, "configVersionPlainText", {
applicationId: application.ref,
configurationProfileId: configProfilePlainText.ref,
content: value,
contentType: "text/plain",
});
This is a string (test
) that was base64 encoded (dGVzdA==
), then stored as plain text on AppConfig. If I run the same function as above with transform="binary"
, the result will be this:
b'test'
<class 'bytes'>
I expected this to be decoded from base64 (good), but I'm confused as to why it's not a string
and it's still bytes
.
PS: I think the TS implementation of the transform is wrong, or at least it doesn't behave the same :/
Hi @dreamorosi! When using AppConfig Freeform you can choose multiple source providers to store/get your configuration including S3 and SSM Document, you can even store a jpeg/zip/others in S3 and use it as a value for your parameter. Even the client using the transform parameter to unpack the base64 stored in AppConfig provider, we return it in bytes because we don't know all the clients' use cases and a possible decode of bytes to string could break this.
I agree with you that this causes some confusion and we can take this opportunity to make our documentation more explicit to explain this, but in this way, we avoid any issues and try to cover all use cases that the user might have.
Here's an example where we use AppConfig FreeForm to store an S3 object (a jpg image) and force a byte-to-string decode, we get an error because Python can't decode it. Maybe that doesn't happen in Typescript because of using Uint8Array (I saw the SDK documentation and its implementation).
I believe this behavior of having the return in bytes is very specific to AppConfig, the other providers (SSM, DynamoDB, Secrets) make sense to return string.
Hope this helped you understand this behavior.
Speaking of transformation, from my point of view it is a bit confusing, after reading the docs I expected that the
transform
argument should be the desired type. But it is my perception.
Thank you for this feedback @shdq! We will work to improve our documentation to make it more clear.
Have a great week!
I see, you are right in that we can't make assumptions on what is going to be returned.
Everything is clear now, and I'm sure I need to go back to the drawing board with our AppConfig provider to align the transform behavior to the one in Python.
We can close the issue whenever you want.
Thanks a lot!
This issue is now closed. Please be mindful that future comments are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.
Summary
Hello there! I hope you all are doing well
I'm working on the AppConfig provider for typescript powertools and I want to address some questions and concerns to the python team. I hope you'll help me to figure that out.
We have a conversation here about the return type of function
_get
inappconfig.py
. As I understand return value from AppConfig should be returned as is. The same I see in the docs example. That is why I think you missed it in PR #877 please clarify this for me if I'm wrong.I'm concerned about
_get
function logic. What if we need two different configuration profiles for the save application and environment?with the
value1
westart_configuration_session
and getInitialConfigurationToken
which encapsulatesWe use that initial token in the subsequent API call (the request consists of the token only) to get
my_conf_A
configuration andNextPollConfigurationToken
. After that, we saveNextPollConfigurationToken
inthis._next_token
.Not we want to get
"my_conf_B"
and store it invalue2
._get
function check forself._next_token
existence and make an API call with the existentNextPollConfigurationToken
which encapsulatesConfigurationProfileIdentifier
withmy_conf_A
in it and completely ignoresname
argument.I assume we get the same values in
value1
andvalue2
. I'm not sure if is it a real-world situation, but as I understand, for the same application and environment, we can have several configuration profiles. Docs section.There is also an unused
self.current_version = ""
in the class._get_multiple
function I noticed that you raise NotImplementedError() in derived class and the docs say thatIt should not be used to indicate that an operator or method is not meant to be supported at all – in that case either leave the operator / method undefined or, if a subclass, set it to None.
Maybe it's not an issue – let me know!# Retrieve the latest configuration
.Why is this needed?
To provide consistency across implementations in different languages.
Which area does this relate to?
Parameters, Typing
Solution
A possible solution for the return type:
In the
appconfig.py
Current return type:def _get(self, name: str, **sdk_options) -> str:
Suggested return type:def _get(self, name: str, **sdk_options) -> bytes:
A possible solution for the method logic:
Store
name
from the previous call along withNextPollConfigurationToken
and flushself._next_token
andself.last_returned_value
ifname
is different from the previous call.Another option is to use
dict
to storeNextPollConfigurationToken
tokens withname
as the key. But it increases complexity, cause in this case you should also storelast_returned_value
for each token. I think flush and starting a new session is the better option.All best, Sergei.
Acknowledgment