awslabs / aws-athena-query-federation

The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own data sources and code.
Apache License 2.0
557 stars 293 forks source link

[FEATURE] Support the MAP datatype #361

Open joshuanapoli opened 3 years ago

joshuanapoli commented 3 years ago

I have DynamoDB records with key-value pairs mapping id to documents. For example:

{
  "items": {
    "<uuid1>": { id: "<uuid1>", name: "Alice" },
    "<uuid2>": { id: "<uuid2>", name: "Bob" },
    "<uuid3>": { id: "<uuid3>", name: "Bob" }
  }
}

I think that the Athena MAP datatype is meant for this purpose. https://docs.aws.amazon.com/athena/latest/ug/data-types.html

I would like to define a schema using the MAP datatype, so that I can query my table using the DynamoDB

CREATE EXTERNAL TABLE mytable(
  items MAP<STRING, STRUCT<id:string, name:string>>
  )

The connectors appear to support ARRAY and STRUCT data-types, but not the Athena MAP data-type.

An alternative would be mapping the id-keyed document to an Athena STRING column. If the connector JSON-encoded the document, then I could decoded it in Athena/Presto. Unfortunately, this does not work. The DynamoDB connector formats the document content to the string column, but it is not JSON.

burhan94 commented 3 years ago

We have this planned as a part of our Arrow upgrade. Stay tuned for a future release that will contain this feature.

rnatarajan commented 3 years ago

Hi @burhan94 , Since Master already has arrow upgraded, Curious about release of next version.

C2BB commented 3 years ago

Curious as well!

jaysethia commented 3 years ago

Hello @burhan94, looking forward to this feature support as well! Any updates on update to support MAP data from DynamoDB connector natively in Athena?

rnatarajan commented 2 years ago

Created a merge request with Example Record Handler with Map field - https://github.com/awslabs/aws-athena-query-federation/pull/734

If this code is added to a federated query connector,

Failed to get metadata for table () from lambda function due to java.lang.IllegalArgumentException: Unsupported Arrow Type [Map(false)] in Lambda Data Source

  • Athena throws following error when a query is ran to select the field GENERIC_INTERNAL_ERROR: Exception while processing column LambdaColumnHandle{name='', type=map(array(varchar), array(varchar)), comment='null'}
rnatarajan commented 2 years ago

Created a Merge request with Example Record Hander adding second entry to Map - https://github.com/awslabs/aws-athena-query-federation/pull/739 Athena works fine with one entry in the map but the second entry is not returned.

Athena throws following error when loading table metadata from Glue Failed to get metadata for table (<table_name>) from lambda function due to java.lang.IllegalArgumentException: Unsupported Arrow Type [Map(false)] in Lambda Data Source

slomkarafa commented 2 years ago

Hello, +1 from my side, I tried to finish implementation of DeltaLake connector #509, but map support is not working properly as suggested.

slomkarafa commented 2 years ago

Do anyone knows/have any plans regarding this issue? Maybe @burhan94 ?

sanatdeshpande1 commented 2 years ago

I have tried to implement this by not using Glue and directly going with Apache Arrow to construct schema

slomkarafa commented 2 years ago

hello @sanatdeshpande1, have you finished with any solution?

henrymai commented 1 year ago

We can enable map support after:

Once those things are done we can be confident that Map support works and then we can enable it.

matthost commented 1 year ago

Seems like Arrow 13 is being used. Time to fix this? IMO this is a big feature for making these Connectors useful.

hervenivon commented 11 months ago

yes please 🙏