awslabs / aws-athena-query-federation

The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own data sources and code.
Apache License 2.0
557 stars 293 forks source link

[BUG] DocumentDB (MongoDB) connector can't handle search condition on columns with uppercase letters #1276

Open tlierdotfr opened 1 year ago

tlierdotfr commented 1 year ago

Describe the bug Athena SQL queries are OK when "where" conditions use columns that are in LowerCase on MongoDB data source but when I try to filter on a column that has a CamelCase case, query result is empty, even with the "disable_projection_and_casing" environment variable set.

To Reproduce Steps to reproduce the behavior:

  1. Set up MongoDB and the DocumentDB connector for Athena.
  2. Set disable_projection_and_casing environment variable to true.
  3. Configure the Glue table intervention with docdb-metadata-flag ans sourceTable properties
  4. Configure the Glue table schema with lower case Column name eventtype accordingly to his UpperCase MongoDB value eventType
  5. Insert a document with an uppercase letter in the name of one column: db.intervention.insert({})
  6. In Athena, query the collection: select * from intervention
  7. See results
  8. In Athena, query the collection with filter: select * from intervention where eventtype = 1
  9. See no results at all

Expected behavior The query should filter result accordingly to column value.

Screenshots / Exceptions / Errors First query is ok when I use a condition on full lowercase column (start column) image Second query is ko when I add another condition uppercase column (eventType column). The result is the same if I use eventtype or eventType in query. image Configuration of Glue Table image

Connector Details (please complete the following information):

Additional context I use MongoDB 4.4 data source and not DocumentDB for this use case.

henrymai commented 1 year ago

Just leaving a note for whoever ends up working on this.

It looks like columnNameMapping has to be implemented here just like its done for DynamoDB:

https://docs.aws.amazon.com/athena/latest/ug/connectors-dynamodb.html#:~:text=DynamoDB%20table%20names.-,columnMapping%20%E2%80%93,-Optional%20table%20property

https://github.com/awslabs/aws-athena-query-federation/blob/6ed23adf1b79886d9163e0d0954ff584f657bd32/athena-dynamodb/src/main/java/com/amazonaws/athena/connectors/dynamodb/util/DDBRecordMetadata.java#L136

tlierdotfr commented 1 year ago

Thanks a lot @henrymai for pointing out that part of the DynamoDB documentation.

Knowing this is DynamoDB doc and not DocumentDB , I still have just tried to add this property to my Glue Table and I confirm that it's working now :-) My "where" conditions are now ok in Athena SQL queries.

It seems like this is well implemented but the documentation has not been updated accordingly ;-)

henrymai commented 1 year ago

@tlierdotfr

You're right, it looks like its implemented in the base GlueMetadataHandler: https://github.com/awslabs/aws-athena-query-federation/blob/6ed23adf1b79886d9163e0d0954ff584f657bd32/athena-federation-sdk/src/main/java/com/amazonaws/athena/connector/lambda/handlers/GlueMetadataHandler.java#L410

Glad to hear that its working for you.

Can you try also removing the disable_projection_and_casing environment variable now to see if things just work since you have the columnMapping glue table property now?

tlierdotfr commented 1 year ago

Thanks for the tip @henrymai, I confirm that now it's also working after having removed the disable_projection_and_casing env variable 👍

The only subject with removing this env variable is that I need to specify a Glue table definition for all my MongoDB collections that contains UpperCase column names. But this is not an issue for me :-)

hackett123 commented 1 year ago

Seems like this is all resolved now, glad we could work it out. Closing the ticket now

hackett123 commented 1 year ago

Going to reopen so the team can look into what is going wrong with the disable_projection_and_casing env var

tlierdotfr commented 1 year ago

@hackett123 it would be great also to update the documentDB connector documentation

stefrem commented 8 months ago

@hackett123 If it can help team, we encounter same issue since 9th january 2024. Before this date, our request in Athena with DocumentDb connector works fine but after all requests with a "WHERE" clause with a capital in column name return zero result. We fix it with columnMapping property.

mo2menelzeiny commented 6 months ago

I'm encountering the same issue I have disable_projection_and_casing with DynamoDB connector, and I use a Glue crawler to crawl the table, select all works fine but any type of where condition returns 0 results