Closed SeanPrendi closed 4 years ago
@atennak1 and @soojinj any thoughts on this? I think you are most familiar with this code path.
Was there a stacktrace printed with doGetTable: Unable to retrieve table [table] from AWSGlue in database/schema [database]. Falling back to schema inference
? If so could you provide it?
Sorry for the late response, here is the stacktrace for that warning:
doGetTable: Unable to retrieve table [table] from AWSGlue in database/schema [database]. Falling back to schema inference. If inferred schema is incorrect, create a matching table in Glue to define schema (see README) java.lang.NullPointerException: null at org.apache.arrow.util.Preconditions.checkNotNull(Preconditions.java:767) ~[task/:?] at org.apache.arrow.vector.types.pojo.FieldType.
(FieldType.java:49) ~[task/:?] at org.apache.arrow.vector.types.pojo.FieldType.nullable(FieldType.java:34) ~[task/:?] at com.amazonaws.athena.connector.lambda.data.FieldBuilder.build(FieldBuilder.java:256) ~[task/:?] at com.amazonaws.athena.connector.lambda.metadata.glue.GlueFieldLexer.lexComplex(GlueFieldLexer.java:88) ~[task/:?] at com.amazonaws.athena.connector.lambda.metadata.glue.GlueFieldLexer.lex(GlueFieldLexer.java:59) ~[task/:?] at com.amazonaws.athena.connectors.dynamodb.DynamoDBMetadataHandler.convertField(DynamoDBMetadataHandler.java:465) ~[task/:?] at com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler.doGetTable(GlueMetadataHandler.java:361) ~[task/:?] at com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler.doGetTable(GlueMetadataHandler.java:308) ~[task/:?] at com.amazonaws.athena.connectors.dynamodb.DynamoDBMetadataHandler.doGetTable(DynamoDBMetadataHandler.java:230) [task/:?] at com.amazonaws.athena.connector.lambda.handlers.MetadataHandler.doHandleRequest(MetadataHandler.java:245) [task/:?] at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:132) [task/:?] at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:100) [task/:?] at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:909) [LambdaSandboxJava-1.0.jar:?] at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:341) [LambdaSandboxJava-1.0.jar:?] at lambdainternal.AWSLambda. (AWSLambda.java:63) [LambdaSandboxJava-1.0.jar:?] at java.lang.Class.forName0(Native Method) ~[?:1.8.0_201] at java.lang.Class.forName(Class.java:348) [?:1.8.0_201] at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:119) [LambdaJavaRTEntry-1.0.jar:?]
Let me know if anything else would be useful and I will try to provide it.
Looks like when it comes to Lists we only go one level deep. And arrayType.getValue()
is null for some reason
Will you be able to go deeper than one level, or is a limitation of the platform?
Should be do-able. This bug fix is in queue for someone on our team to pick up.
Facing same issue for the DocumentDB connector whenever I include arrays of structs in the schema in the Glue Catalog. When I drop fields with arrays of structs it starts working.
GENERIC_USER_ERROR: Encountered an exception[null] from your LambdaFunction
Fixed by #228 and #232
Describe the bug It seems like the DynamoDB connector breaks when it encounters a field with the array-of-struct type. This seems to be intimately connected with the issue that I was facing in #182. We currently managed to bypass this issue to an extent by using DDL to define copies of the offending tables with the array-of-struct fields ignored. Interestingly, based on CloudWatch logs it seems like running the connector on tables that have columns with this type is fine by itself as they're not parsed as array-of-struct, but instead list-of-struct, which works. However, the glue schema inference marks these fields as being of type array-of-struct, which breaks the connector. The connector then falls back to its inference, which brings us back to the null reference exception that was the original reason for using the glue connector.
To Reproduce Create a table in DynamoDB with an array of structs Crawl the table in glue Query the glue table with the connector
Expected behavior The query should be executed successfully
Screenshots / Exceptions / Errors First the connector encounters the array-of-struct type, and CloudWatch logs:
Then fails and falls back to the Connector's inference schema
However, when running a table with similar data (but no empty values, so inference can be performed successfully by the dynamo connector), we see
Connector Details (please complete the following information):