Describe the bug
Timestream connectors raises ValidationException due to lack of correct conversion from TIMESTAMP/DATEtypes to Timestream's TIMESTAMP type. An example of the error is described in a separate section.
To Reproduce
Steps to reproduce the behavior:
Deploy a Timestream connector.
Ingest data into Timestream.
Execute a query like the following.
Data ingestion to Timestream
aws timestream-write create-database --database-name test
aws timestream-write create-table \
--database-name test \
--table-name my_tbl \
--retention-properties "MemoryStoreRetentionPeriodInHours=1,MagneticStoreRetentionPeriodInDays=1" \
--magnetic-store-write-properties "EnableMagneticStoreWrites=true"
for metric_val in $(seq 1 10); do
time_unixstamp_current=$(date +%s)
aws timestream-write write-records \
--database-name test \
--table-name my_tbl \
--records "Dimensions=[{Name=dummy_dimension,Value=dimension_val,DimensionValueType=VARCHAR}],Time=${time_unixstamp_current}000,TimeUnit=MILLISECONDS,MeasureName=dummy_measure,MeasureValueType=BIGINT,MeasureValue=${metric_val}"
sleep 1
done
Athena Query
SELECT count(*) FROM "test"."my_tbl" WHERE time >= DATE '2024-04-05' LIMIT 1
Expected behavior
Timestream connector should not raise any error even if we provide a WHERE clause using time column.
Screenshots / Exceptions / Errors
Other examples to cause error.
Input SQL:
SELECT
DATE(time + INTERVAL '9' HOUR) AS "日付",
dummy_dimension,
count(*)
from
"test"."my_tbl"
WHERE
dummy_dimension = 'dimension_val' AND
time BETWEEN (now() - INTERVAL '7' DAY) AND now()
GROUP BY
DATE(time + INTERVAL '9' HOUR), dummy_dimension
ORDER BY
DATE(time + INTERVAL '9' HOUR)
LIMIT 100
Result:
GENERIC_USER_ERROR: Encountered an exception[com.amazonaws.services.timestreamquery.model.ValidationException] from your LambdaFunction[arn:aws:lambda:ap-northeast-1:xxxxxxxxxxxx:function:timestream] executed in context[S3SpillLocation{bucket='aws-athena-query-spill-xxxxxxxxxxxx-ap-northeast-1', key='athena-spill/a41fc27c-b642-454d-bb37-71a90952e87a/fb4008df-2800-425c-a0fc-b68153b74dd2', directory=true}] with message[The query syntax is invalid at line 5:80 (Service: AmazonTimestreamQuery; Status Code: 400; Error Code: ValidationException; Request ID: BVZAZYZQMTSBTIENK7OKUTIDX4; Proxy: null)]
If you enable the info logging by setting environment variable ATHENA_FEDERATION_SDK_LOG_LEVEL=info, you can find a invalid SQL.
CloudWatch Logs:
2024-04-05 05:55:42 797f8ddd-3404-4d81-aa1d-c1f405a0a595 INFO TimestreamRecordHandler:127 - readWithConstraint: query[SELECT
dummy_dimension, time
FROM
"test"."my_tbl"
WHERE ("dummy_dimension" = 'dimension_val') AND (("time" >= 2024-03-29T05:55:39.961 AND "time" <= 2024-04-05T05:55:39.961))]
Connector Details (please complete the following information):
Describe the bug Timestream connectors raises
ValidationException
due to lack of correct conversion fromTIMESTAMP
/DATE
types to Timestream'sTIMESTAMP
type. An example of the error is described in a separate section.To Reproduce Steps to reproduce the behavior:
Data ingestion to Timestream
Athena Query
Expected behavior Timestream connector should not raise any error even if we provide a WHERE clause using time column.
Screenshots / Exceptions / Errors Other examples to cause error.
Input SQL:
Result:
If you enable the info logging by setting environment variable
ATHENA_FEDERATION_SDK_LOG_LEVEL=info
, you can find a invalid SQL.CloudWatch Logs:
Connector Details (please complete the following information):
Additional context I have already created a patch to fix this issue. I will create a pull request shortly.