aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.93k stars 698 forks source link

SHOW queries in aws timestream bug #1344

Closed Ada-Nick closed 2 years ago

Ada-Nick commented 2 years ago

Describe the bug

When running:

import awswrangler as wr
query = f'SHOW MEASURES from {DATABASE}.{TABLE}'
wr.timestream.query(query)

The query fails to execute with error:

ValueError: Query with non ScalarType for column dimensions: {'ArrayColumnInfo': {'Type': {'RowColumnInfo': [{'Name': 'dimension_name', 'Type': {'ScalarType': 'VARCHAR'}}, {'Name': 'data_type', 'Type': {'ScalarType': 'VARCHAR'}}]}}}

How to Reproduce

import awswrangler as wr
query = f'SHOW MEASURES from {DATABASE}.{TABLE}'
wr.timestream.query(query)

Expected behavior

Return a pandas dataframe with a single column containing all the measures from my table

Your project

No response

Screenshots

No response

OS

Win

Python version

3.9.7

AWS DataWrangler version

2.15.1

Additional context

No response

kukushking commented 2 years ago

Hi @Ada-Nick looks like wrangler timestream module doesn't yet know how to interpret arrays e.g. multiple dimensions per measure in this case. Worst case we can cast nested structures to an object or expand it to multiple rows (which won't be possible in all cases)

kukushking commented 2 years ago

@Ada-Nick Please see if that works for you

pip install git+https://github.com/awslabs/aws-data-wrangler.git@timestream-show-measures
wr.timestream.query(sql='SHOW MEASURES from "..."."..."')

  measure_name data_type                                         dimensions
0     measure1     multi  [{'RowValue': {'Data': [{'ScalarValue': 'dim1'...
Ada-Nick commented 2 years ago

Great thanks