awslabs / aws-athena-query-federation

The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own data sources and code.
Apache License 2.0
557 stars 293 forks source link

[BUG] ValidationException occurs when you query with a condition using TIMESTAMP or DATE types with connector for Timestream #1887

Closed StoneDot closed 5 months ago

StoneDot commented 7 months ago

Describe the bug Timestream connectors raises ValidationException due to lack of correct conversion from TIMESTAMP/DATEtypes to Timestream's TIMESTAMP type. An example of the error is described in a separate section.

To Reproduce Steps to reproduce the behavior:

  1. Deploy a Timestream connector.
  2. Ingest data into Timestream.
  3. Execute a query like the following.

Data ingestion to Timestream

aws timestream-write create-database --database-name test

aws timestream-write create-table \
    --database-name test \
    --table-name my_tbl \
    --retention-properties "MemoryStoreRetentionPeriodInHours=1,MagneticStoreRetentionPeriodInDays=1" \
    --magnetic-store-write-properties "EnableMagneticStoreWrites=true"

for metric_val in $(seq 1 10); do
    time_unixstamp_current=$(date +%s)
    aws timestream-write write-records \
        --database-name test \
        --table-name my_tbl \
        --records "Dimensions=[{Name=dummy_dimension,Value=dimension_val,DimensionValueType=VARCHAR}],Time=${time_unixstamp_current}000,TimeUnit=MILLISECONDS,MeasureName=dummy_measure,MeasureValueType=BIGINT,MeasureValue=${metric_val}"
    sleep 1
done

Athena Query

SELECT count(*) FROM "test"."my_tbl" WHERE time >= DATE '2024-04-05' LIMIT 1

Expected behavior Timestream connector should not raise any error even if we provide a WHERE clause using time column.

Screenshots / Exceptions / Errors Other examples to cause error.

Input SQL:

SELECT
  DATE(time + INTERVAL '9' HOUR) AS "日付",
  dummy_dimension,
  count(*)
from
  "test"."my_tbl"
WHERE
  dummy_dimension = 'dimension_val' AND
  time BETWEEN (now() - INTERVAL '7' DAY) AND now()
GROUP BY
  DATE(time + INTERVAL '9' HOUR), dummy_dimension
ORDER BY
  DATE(time + INTERVAL '9' HOUR)
LIMIT 100

Result:

GENERIC_USER_ERROR: Encountered an exception[com.amazonaws.services.timestreamquery.model.ValidationException] from your LambdaFunction[arn:aws:lambda:ap-northeast-1:xxxxxxxxxxxx:function:timestream] executed in context[S3SpillLocation{bucket='aws-athena-query-spill-xxxxxxxxxxxx-ap-northeast-1', key='athena-spill/a41fc27c-b642-454d-bb37-71a90952e87a/fb4008df-2800-425c-a0fc-b68153b74dd2', directory=true}] with message[The query syntax is invalid at line 5:80 (Service: AmazonTimestreamQuery; Status Code: 400; Error Code: ValidationException; Request ID: BVZAZYZQMTSBTIENK7OKUTIDX4; Proxy: null)]

If you enable the info logging by setting environment variable ATHENA_FEDERATION_SDK_LOG_LEVEL=info, you can find a invalid SQL.

CloudWatch Logs:

2024-04-05 05:55:42 797f8ddd-3404-4d81-aa1d-c1f405a0a595 INFO  TimestreamRecordHandler:127 - readWithConstraint: query[SELECT 
dummy_dimension, time 
FROM 
"test"."my_tbl" 
WHERE ("dummy_dimension" = 'dimension_val') AND (("time" >= 2024-03-29T05:55:39.961 AND "time" <= 2024-04-05T05:55:39.961))]

Connector Details (please complete the following information):

Additional context I have already created a patch to fix this issue. I will create a pull request shortly.

StoneDot commented 7 months ago

I have created a pull request #1888 to fix this issue.