Exactly as stated in the title. If you're querying Athena using a qmark style query and use the query cache, and you change the actual parameter values between two query runs, aws-sdk-pandas/awswrangler retrieves the result of the previous query run with incorrect parameter values.
How to Reproduce
import awswrangler as wr
df = wr.athena.read_sql_query(
sql="SELECT distinct id FROM my_database.my_table WHERE id in (?, ?)",
database="my_database",
ctas_approach=False,
unload_approach=False,
params=["id1", "id2"],
paramstyle="qmark",
workgroup="primary",
athena_cache_settings={"max_cache_seconds": 300}
)
print(df)
df2 = wr.athena.read_sql_query(
sql="SELECT distinct id FROM my_database.my_table WHERE id in (?, ?)",
database="my_database",
ctas_approach=False,
unload_approach=False,
params=["id3", "id4"],
paramstyle="qmark",
workgroup="primary",
athena_cache_settings={"max_cache_seconds": 300}
)
print(df2)
this with a database and table should be enough to reproduce the issue.
Expected behavior
Since the parameters to the query have changed, a fresh query should be run.
Your project
No response
Screenshots
No response
OS
Amazon Linux 2023
Python version
3.12
AWS SDK for pandas version
layer:AWSSDKPandas-Python312-Arm64:12
Additional context
My guess is that this is happening due to aws-sdk-pandas using a hand-rolled cache implementation rather than just letting Athena handle the cache through the ResultReuseConfiguration of StartQueryExecution. As far as I know, Athena does not populate the query parameter values in the GetQueryExecution output. So you get the parametrized query without the parameter values and check the current query against it, which is incorrect.
Describe the bug
Exactly as stated in the title. If you're querying Athena using a qmark style query and use the query cache, and you change the actual parameter values between two query runs, aws-sdk-pandas/awswrangler retrieves the result of the previous query run with incorrect parameter values.
How to Reproduce
this with a database and table should be enough to reproduce the issue.
Expected behavior
Since the parameters to the query have changed, a fresh query should be run.
Your project
No response
Screenshots
No response
OS
Amazon Linux 2023
Python version
3.12
AWS SDK for pandas version
layer:AWSSDKPandas-Python312-Arm64:12
Additional context
My guess is that this is happening due to aws-sdk-pandas using a hand-rolled cache implementation rather than just letting Athena handle the cache through the
ResultReuseConfiguration
ofStartQueryExecution
. As far as I know, Athena does not populate the query parameter values in theGetQueryExecution
output. So you get the parametrized query without the parameter values and check the current query against it, which is incorrect.