boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
8.81k stars 1.84k forks source link

DynamoDB condition operator arity not handled correctly, produces semi-cryptic error #4113

Open crh23 opened 1 month ago

crh23 commented 1 month ago

Describe the bug

The binary condition operators in boto3.dynamodb.conditions, such as And and Or, don't correctly handle being given three or more arguments. This is not caught client-side, and causes a pretty cryptic error.

Expected Behavior

Either the operators are handled correctly (ideally with type hinting), or a more relevant error is raised

Current Behavior

Errors are like

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Scan operation: Value provided in ExpressionAttributeNames unused in expressions: keys: {#n2}

the generated request body from the below SSCCE is

{"TableName": "my_table", "FilterExpression": "(#n0 > :v0 AND #n1 < :v1)", "ExpressionAttributeNames": {"#n0": "age", "#n1": "age", "#n2": "height"}, "ExpressionAttributeValues": {":v0": {"N": "20"}, ":v1": {"N": "30"}, ":v2": {"N": "180"}}}

which is indeed invalid as indicated in the error message.

Reproduction Steps

Example I would expect to work but doesn't:

import boto3
from boto3.dynamodb.conditions import Attr, And

# Table doesn't need to exist to reproduce, since request is validated syntactically before hitting table
table_name = "my_table"

dynamodb = boto3.resource("dynamodb")

table = dynamodb.Table(table_name)

response = table.scan(
    FilterExpression=And(
        Attr("age").gt(20),
        Attr("age").lt(30),
        Attr("height").gt(180),
    )
)

Current workaround:

import boto3
from boto3.dynamodb.conditions import Attr, And
from functools import reduce

# Table does need to exist if you want a successful response!
table_name = "my_table"

dynamodb = boto3.resource("dynamodb")

table = dynamodb.Table(table_name)

response = table.scan(
    FilterExpression=reduce(
        And, [
            Attr("age").gt(20),
            Attr("age").lt(30),
            Attr("height").gt(180),
        ]
    )
)

Possible Solution

Explanation:

The conditions all inherit from ConditionBase, which has properties expression_operator (the character in the DynamoDB condition expression syntax used to indicate the relevant operator), values (the operands), and expression_format (a string with python format fields {operator} and some of {0}, {1}, {2}).

The expression_operator and expression_format are defined by the operator, and the values are set on construction. For example, And: https://github.com/boto/boto3/blob/096e45841545f24bbd572a62504cc0dbb62e6b07/boto3/dynamodb/conditions.py#L226-L228

Note that ConditionBase's constructor takes an arbitrary number of values.

The number of values passed to the condition operator is never validated, which is the root cause of this error. The core of the expression builder is https://github.com/boto/boto3/blob/096e45841545f24bbd572a62504cc0dbb62e6b07/boto3/dynamodb/conditions.py#L359-L383

Note that every value passed (the number of which is never checked) gets a placeholder value, but only as many as there are number format fields in expression_format will actually be included in the final condition expression (since Python's str.format() silently ignores extraneous arguments).

Possible fix:

Add a property arity or number_of_values to ConditionBase, and have the constructor validate that a valid number has been received. Optionally, for quality of life also expand the And and Or constructs to take an arbitrary number of arguments (either by transparently expanding to "{0} AND {1} AND {2} AND {3}" or "((({0} AND {1}) AND {2}) AND {3})")

Additional Information/Context

No response

SDK version used

1.34.97

Environment details (OS name and version, etc.)

Windows 10

crh23 commented 1 month ago

As an aside, those operators don't seem to be documented anywhere, causing #4112 (doc issue now opened as #4117)

RyanFitzSimmonsAK commented 1 month ago

Hi @crh23, thanks for reaching out. You should be able to chain these conditions using logical operators. The FilterExpression you used as an example would be written as FilterExpression = Attr("age").gt(20) & Attr("age").lt(30) & Attr("height").gt(180). Using this, I was able to successfully scan a table and have all three conditions applied. Does this fulfill your use case, or am I misunderstanding the issue in some way?

crh23 commented 1 month ago

Chaining does work, since using & makes Python collapse it down to a bunch of binary operations. I definitely don't think that there is missing functionality here, just a sharp edge for devs to catch themselves on.

The main issues are the lack of documentation (#4117) and the lack of validation (this issue). It would be reasonable to expect the example I gave to work, but:

  1. It doesn't work (OK)
  2. This isn't caught until the request is sent (not great - it would be easy to catch this client-side)
  3. The error message doesn't obviously relate to the actual cause of the error (not OK - makes debugging unnecessarily hard).

For context, this was raised by an actual user who I helped to fix their code - their code was similar to my given example, and they didn't understand why it wasn't working.

(this would also be addressed with proper type signatures for boto3, but that's a somewhat larger fix :P )