Closed keelzebub closed 2 years ago
Hi @beaksandclaws, can you share the Glue schema and the complete CloudWatch logs (including the error). Also, which version of the connector are you using?
Hi @shurvitz!
I'm not totally sure how to get the version of the connector that was uploaded—there doesn't seem to be any way to view that via the Lambda console (or maybe there's something that I'm missing). I do know that it's 7 months old at this point.
{
"FieldSchema": [
{ "name": "crop_longevity_ids", "type": "bigint" },
{ "name": "crop_ids", "type": "set<bigint>" },
{ "name": "crop_use_ids", "type": "set<bigint>" },
{ "name": "pk2", "type": "string" },
{ "name": "pk1", "type": "string" },
{ "name": "sk1", "type": "string" },
{ "name": "crop_uses", "type": "set<string>" },
{ "name": "crop_longevities", "type": "string" },
{ "name": "sk2", "type": "string" },
{ "name": "id", "type": "string" },
{ "name": "crops", "type": "set<string>" },
{ "name": "soil_amendment_types", "type": "set<string>" },
{ "name": "rotational", "type": "boolean" },
{ "name": "soil_amendment_type_ids", "type": "set<bigint>" },
{ "name": "weed_pest_control_general_ids", "type": "set<bigint>" },
{ "name": "weed_pest_control_specific_ids", "type": "set<bigint>" },
{ "name": "weed_pest_control_generals", "type": "set<string>" },
{ "name": "weed_pest_control_specifics", "type": "set<string>" },
{ "name": "soil_amendment_activities", "type": "set<string>" },
{ "name": "soil_amendment_activity_ids", "type": "set<bigint>" },
{ "name": "tillages", "type": "string" },
{ "name": "tillage_ids", "type": "bigint" },
{ "name": "cropping_ids", "type": "set<bigint>" },
{ "name": "irrigation_general_ids", "type": "bigint" },
{ "name": "irrigation_specifics", "type": "string" },
{ "name": "croppings", "type": "set<string>" },
{ "name": "irrigation_specific_ids", "type": "bigint" },
{ "name": "irrigation_generals", "type": "string" },
{ "name": "certification_managements", "type": "set<string>" },
{ "name": "certification_management_ids", "type": "set<bigint>" },
{ "name": "animal_specifics", "type": "set<string>" },
{ "name": "animal_specific_ids", "type": "set<bigint>" },
{ "name": "animal_generals", "type": "set<string>" },
{ "name": "animal_general_ids", "type": "set<bigint>" },
{ "name": "crop_densities", "type": "string" },
{ "name": "crop_density_ids", "type": "bigint" },
{ "name": "burnings", "type": "string" },
{ "name": "burning_ids", "type": "bigint" },
{ "name": "value_status_id", "type": "bigint" },
{ "name": "continent_ids", "type": "set<bigint>" },
{ "name": "published_dimension_unit_id", "type": "bigint" },
{ "name": "climate_group_id", "type": "bigint" },
{ "name": "country_ids", "type": "set<bigint>" },
{ "name": "percent_approved", "type": "bigint" },
{ "name": "valuation_methodology_general", "type": "string" },
{ "name": "valuation_methodology_specific_id", "type": "bigint" },
{ "name": "transcription_id", "type": "bigint" },
{ "name": "ecosystem_descriptor_specific_id", "type": "bigint" },
{ "name": "ecosystem_service_specific", "type": "string" },
{ "name": "ecosystem_descriptor_general", "type": "string" },
{ "name": "climate_group", "type": "string" },
{ "name": "area", "type": "double" },
{ "name": "dataset_ids", "type": "set<bigint>" },
{ "name": "ecosystem_descriptor_sub_specific_id", "type": "bigint" },
{ "name": "published_dimension_unit", "type": "string" },
{ "name": "ecosystem_service_general", "type": "string" },
{ "name": "ecosystem_service_category_id", "type": "bigint" },
{ "name": "ecosystem_service_sub_specific_id", "type": "bigint" },
{ "name": "countries", "type": "set<string>" },
{ "name": "ecosystem_service_specific_id", "type": "bigint" },
{ "name": "publication_type", "type": "string" },
{ "name": "withins", "type": "set<string>" },
{ "name": "continents", "type": "set<string>" },
{ "name": "ecosystem_service_id", "type": "bigint" },
{ "name": "ecosystem_descriptor_id", "type": "bigint" },
{ "name": "proximities", "type": "set<string>" },
{ "name": "bibliographic_reference", "type": "string" },
{ "name": "calculation_type_id", "type": "bigint" },
{ "name": "ecosystem_descriptor_sub_specific", "type": "string" },
{ "name": "name", "type": "string" },
{ "name": "site_id", "type": "bigint" },
{ "name": "calculated_high", "type": "double" },
{ "name": "calculated_low", "type": "double" },
{ "name": "inflated_low", "type": "double" },
{ "name": "valuation_methodology_general_id", "type": "bigint" },
{ "name": "authors", "type": "string" },
{ "name": "is_usd_acre_yr", "type": "string" },
{ "name": "transcription_state_id", "type": "bigint" },
{ "name": "subcountry_ids", "type": "set<bigint>" },
{ "name": "within_ids", "type": "set<bigint>" },
{ "name": "ecosystem_descriptor_specific", "type": "string" },
{ "name": "title", "type": "string" },
{ "name": "currency_code", "type": "string" },
{ "name": "published_low", "type": "double" },
{ "name": "calculation_type", "type": "string" },
{ "name": "proximity_ids", "type": "set<bigint>" },
{ "name": "flagged", "type": "string" },
{ "name": "primary_authors", "type": "string" },
{ "name": "valuation_methodology_specific", "type": "string" },
{ "name": "climate_type_id", "type": "bigint" },
{ "name": "ecosystem_service_category", "type": "string" },
{ "name": "published_currency_year", "type": "bigint" },
{ "name": "currency", "type": "string" },
{ "name": "published_time_unit", "type": "string" },
{ "name": "value_id", "type": "bigint" },
{ "name": "ecosystem_service_sub_specific", "type": "string" },
{ "name": "valuation_methodology_comments", "type": "string" },
{ "name": "site_scale", "type": "string" },
{ "name": "year_published", "type": "bigint" },
{ "name": "value_status", "type": "string" },
{ "name": "valuation_methodology_id", "type": "bigint" },
{ "name": "inflated_high", "type": "double" },
{ "name": "published_time_unit_id", "type": "bigint" },
{ "name": "published_high", "type": "double" },
{ "name": "ecosystem_service_general_id", "type": "bigint" },
{ "name": "climate_type", "type": "string" },
{ "name": "inflated_currency_year", "type": "bigint" },
{ "name": "ecosystem_descriptor_general_id", "type": "bigint" },
{ "name": "bibliography_url", "type": "string" },
{ "name": "subcountries", "type": "set<string>" },
{ "name": "currency_id", "type": "bigint" },
{ "name": "greenhouse_gasses", "type": "set<string>" },
{ "name": "water_type", "type": "string" },
{ "name": "water_type_id", "type": "bigint" },
{ "name": "percent_rejected", "type": "bigint" },
{ "name": "site_comments", "type": "string" },
{ "name": "other_bibliographic_information", "type": "string" },
{ "name": "bibliography_notes", "type": "string" },
{ "name": "land_cover_nlcd_general", "type": "string" },
{ "name": "converted_time_unit_id", "type": "bigint" },
{ "name": "converted_high", "type": "double" },
{ "name": "converted_time_unit", "type": "string" },
{ "name": "converted_dimension_unit_id", "type": "bigint" },
{ "name": "converted_dimension_unit", "type": "string" },
{ "name": "converted_low", "type": "double" },
{ "name": "ecosystem_descriptor_comments", "type": "string" },
{ "name": "air_pollutants", "type": "set<string>" },
{ "name": "nutrients_and_contaminants", "type": "set<string>" },
{ "name": "provisioning_use", "type": "string" },
{ "name": "calculated_time_unit", "type": "string" },
{ "name": "calculated_dimension_unit", "type": "string" },
{ "name": "valuation_methodology_sub_specific_id", "type": "bigint" },
{ "name": "valuation_methodology_sub_specific", "type": "string" },
{ "name": "ecosystem_service_comments", "type": "string" },
{ "name": "recreation_activities", "type": "set<string>" },
{ "name": "percent_pending", "type": "bigint" },
{ "name": "sub_sub_country", "type": "string" },
{ "name": "ag_mgmt_irrigation_specific_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_ids", "type": "set<string>" },
{ "name": "ag_mgmt_irrigation_specifics", "type": "set<string>" },
{ "name": "ag_mgmt_irrigation_generals", "type": "set<string>" },
{ "name": "ag_mgmt_irrigation_general_ids", "type": "set<bigint>" },
{ "name": "municipality", "type": "string" },
{ "name": "value_type_id", "type": "bigint" },
{ "name": "ag_mgmt_crop_uses", "type": "set<string>" },
{ "name": "ag_mgmt_crops", "type": "set<string>" },
{ "name": "ag_mgmt_crop_ids", "type": "set<bigint>" },
{ "name": "value_type", "type": "string" },
{ "name": "ag_mgmt_crop_use_ids", "type": "set<bigint>" },
{ "name": "elevation_id", "type": "bigint" },
{ "name": "elevation", "type": "string" },
{ "name": "publication_comments", "type": "string" },
{ "name": "year_accepted", "type": "bigint" },
{ "name": "material_valued_general", "type": "string" },
{ "name": "soil_type", "type": "string" },
{ "name": "provisioning_use_other", "type": "string" },
{ "name": "ag_mgmt_animal_specific_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_animal_specifics", "type": "set<string>" },
{ "name": "ag_mgmt_animal_generals", "type": "set<string>" },
{ "name": "ag_mgmt_animal_general_ids", "type": "set<bigint>" },
{ "name": "non_monetary_dimension_unit", "type": "string" },
{ "name": "non_monetary_dimension_type", "type": "string" },
{ "name": "material_valued_sub_specific", "type": "string" },
{ "name": "material_valued_specific", "type": "string" },
{ "name": "material_valued_general_other", "type": "string" },
{ "name": "certifications", "type": "set<string>" },
{ "name": "ag_mgmt_crop_longevity_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_soil_amendment_types", "type": "set<string>" },
{ "name": "ag_mgmt_crop_longevities", "type": "set<string>" },
{ "name": "ag_mgmt_tillage_ids", "type": "set<bigint>" },
{ "name": "material_valued_specific_other", "type": "string" },
{ "name": "ag_mgmt_tillages", "type": "set<string>" },
{ "name": "ag_mgmt_soil_amendment_type_ids", "type": "set<bigint>" },
{ "name": "non_monetary_dimension_unit_other", "type": "string" },
{ "name": "ag_mgmt_weed_pest_control_generals", "type": "set<string>" },
{ "name": "ag_mgmt_weed_pest_control_general_ids", "type": "set<bigint>" },
{ "name": "material_valued_sub_specific_other", "type": "string" },
{ "name": "agricultural_design_strategies", "type": "set<string>" },
{ "name": "ag_mgmt_soil_amendment_activity_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_crop_density_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_crop_densities", "type": "set<string>" },
{ "name": "ag_mgmt_soil_amendment_activities", "type": "set<string>" },
{ "name": "agricultural_design_strategy_other", "type": "string" },
{ "name": "ag_mgmt_cropping_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_croppings", "type": "set<string>" },
{ "name": "ag_mgmt_rotational", "type": "boolean" },
{ "name": "farm_scale_general", "type": "string" },
{ "name": "ag_mgmt_weed_pest_control_specific_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_weed_pest_control_specifics", "type": "set<string>" },
{ "name": "ag_mgmt_certification_management_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_certification_managements", "type": "set<string>" },
{ "name": "farm_scale_specific", "type": "string" },
{ "name": "crop_intensity", "type": "string" },
{ "name": "non_monetary_dimension_type_other", "type": "string" },
{ "name": "ag_mgmt_burning_ids", "type": "set<bigint>" },
{ "name": "ag_mgmt_burnings", "type": "set<string>" },
{ "name": "federal_program_generals", "type": "set<string>" }
]
}
1616770807107 "START RequestId: fcefc5f7-a62e-42d6-9a86-5c2a7b15d222 Version: $LATEST"
1616770807112 "2021-03-26 15:00:07 fcefc5f7-a62e-42d6-9a86-5c2a7b15d222 INFO MetadataHandler:232 - doHandleRequest: request[GetTableRequest{queryId=463e7f8a-d762-43e2-b697-ade1e1176131, tableName=TableName{schemaName=default, tableName=evt_development}}]"
1616770807265 "2021-03-26 15:00:07 fcefc5f7-a62e-42d6-9a86-5c2a7b15d222 INFO GlueMetadataHandler:360 - Column crop_longevity_ids with registered type bigint"
1616770807303 "2021-03-26 15:00:07 fcefc5f7-a62e-42d6-9a86-5c2a7b15d222 INFO GlueMetadataHandler:360 - Column crop_ids with registered type set<bigint>"
1616770807311 "2021-03-26 15:00:07 fcefc5f7-a62e-42d6-9a86-5c2a7b15d222 WARN DynamoDBMetadataHandler:233 - doGetTable: Unable to retrieve table evt_development from AWSGlue in database/schema default. Falling back to schema inference. If inferred schema is incorrect, create a matching table in Glue to define schema (see README)
java.lang.RuntimeException: Unexpected start type set
at com.amazonaws.athena.connector.lambda.metadata.glue.GlueFieldLexer.lexComplex(GlueFieldLexer.java:108) ~[task/:?]
at com.amazonaws.athena.connector.lambda.metadata.glue.GlueFieldLexer.lex(GlueFieldLexer.java:68) ~[task/:?]
at com.amazonaws.athena.connectors.dynamodb.DynamoDBMetadataHandler.convertField(DynamoDBMetadataHandler.java:465) ~[task/:?]
at com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler.doGetTable(GlueMetadataHandler.java:361) ~[task/:?]
at com.amazonaws.athena.connector.lambda.handlers.GlueMetadataHandler.doGetTable(GlueMetadataHandler.java:308) ~[task/:?]
at com.amazonaws.athena.connectors.dynamodb.DynamoDBMetadataHandler.doGetTable(DynamoDBMetadataHandler.java:230) [task/:?]
at com.amazonaws.athena.connector.lambda.handlers.MetadataHandler.doHandleRequest(MetadataHandler.java:250) [task/:?]
at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:132) [task/:?]
at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:100) [task/:?]
at lambdainternal.EventHandlerLoader$2.call(EventHandlerLoader.java:902) [LambdaSandboxJava-1.0.jar:?]
at lambdainternal.AWSLambda.startRuntime(AWSLambda.java:340) [LambdaSandboxJava-1.0.jar:?]
at lambdainternal.AWSLambda.<clinit>(AWSLambda.java:63) [LambdaSandboxJava-1.0.jar:?]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_201]
at java.lang.Class.forName(Class.java:348) [?:1.8.0_201]
at lambdainternal.LambdaRTEntry.main(LambdaRTEntry.java:150) [LambdaJavaRTEntry-1.0.jar:?]"
1616770807392 "2021-03-26 15:00:07 fcefc5f7-a62e-42d6-9a86-5c2a7b15d222 INFO MetadataHandler:251 - doHandleRequest: response[GetTableResponse{tableName=TableName{schemaName=default, tableName=evt_development}, schema=Schema<value_status_id: Decimal(38, 9), continent_ids: List<continent_ids.element: Decimal(38, 9)>, published_dimension_unit_id: Decimal(38, 9), climate_group_id: Decimal(38, 9), country_ids: List<country_ids.element: Decimal(38, 9)>, percent_approved: Decimal(38, 9), valuation_methodology_general: Utf8, valuation_methodology_specific_id: Decimal(38, 9), transcription_id: Decimal(38, 9), ecosystem_descriptor_specific_id: Decimal(38, 9), pk2: Utf8, ecosystem_service_specific: Utf8, pk1: Utf8, ecosystem_descriptor_general: Utf8, climate_group: Utf8, area: Decimal(38, 9), dataset_ids: List<dataset_ids.element: Decimal(38, 9)>, ecosystem_descriptor_sub_specific_id: Decimal(38, 9), published_dimension_unit: Utf8, ecosystem_service_general: Utf8, ecosystem_service_category_id: Decimal(38, 9), ecosystem_service_sub_specific_id: Decimal(38, 9), countries: List<countries.element: Utf8>, ecosystem_service_specific_id: Decimal(38, 9), publication_type: Utf8, withins: List<withins.element: Utf8>, continents: List<continents.element: Utf8>, ecosystem_service_id: Decimal(38, 9), ecosystem_descriptor_id: Decimal(38, 9), proximities: List<proximities.element: Utf8>, bibliographic_reference: Utf8, calculation_type_id: Decimal(38, 9), ecosystem_descriptor_sub_specific: Utf8, name: Utf8, site_id: Decimal(38, 9), calculated_high: Decimal(38, 9), calculated_low: Decimal(38, 9), inflated_low: Decimal(38, 9), valuation_methodology_general_id: Decimal(38, 9), authors: Utf8, is_usd_acre_yr: Utf8, transcription_state_id: Decimal(38, 9), subcountry_ids: List<subcountry_ids.element: Decimal(38, 9)>, within_ids: List<within_ids.element: Decimal(38, 9)>, ecosystem_descriptor_specific: Utf8, title: Utf8, currency_code: Utf8, published_low: Decimal(38, 9), calculation_type: Utf8, proximity_ids: List<proximity_ids.element: Decimal(38, 9)>, flagged: Utf8, primary_authors: Utf8, valuation_methodology_specific: Utf8, climate_type_id: Decimal(38, 9), ecosystem_service_category: Utf8, published_currency_year: Decimal(38, 9), sk1: Utf8, currency: Utf8, published_time_unit: Utf8, value_id: Decimal(38, 9), sk2: Utf8, ecosystem_service_sub_specific: Utf8, valuation_methodology_comments: Utf8, site_scale: Utf8, year_published: Decimal(38, 9), value_status: Utf8, valuation_methodology_id: Decimal(38, 9), inflated_high: Decimal(38, 9), published_time_unit_id: Decimal(38, 9), published_high: Decimal(38, 9), ecosystem_service_general_id: Decimal(38, 9), climate_type: Utf8, inflated_currency_year: Decimal(38, 9), ecosystem_descriptor_general_id: Decimal(38, 9), bibliography_url: Utf8, subcountries: List<subcountries.element: Utf8>, currency_id: Decimal(38, 9), greenhouse_gasses: List<greenhouse_gasses.element: Utf8>>, partitionColumns=[], requestType=GET_TABLE, catalogName=dynamo-db-catalog}]"
1616770807411 "END RequestId: fcefc5f7-a62e-42d6-9a86-5c2a7b15d222"
I do notice that it looks like it might be choking on the schema type "set", although that was generated directly from the Glue Crawler. Do those need to be manually to change to "array"?
@beaksandclaws yes please try changing them to array. I don't remember the Glue Crawler defining set columns as sets. It tended to use Hive compatible types in its definition, so perhaps something was changed recently on the crawler side.
Are there any plans to support set
? Because otherwise, every time the crawler runs I would have to manually go in and update the schema, which could end up being a lot of overhead.
Did a little bit more research, and it looks like the Glue Crawler uses a built-in classifier specifically for DynamoDB, which is why it makes schemas with the set
type (because set
is a valid type in DynamoDB). Given that, I think this might actually need to be logged as a bug, although possibly with the DynamoDB connector.
For posterity's sake, I've create a bug report here: https://github.com/awslabs/aws-athena-query-federation/issues/378
I'm glad I found this otherwise I'd never suspect the "Set" type... I was reading the "set" as a verb... like "Unexpected start type IS set" and had no clue to what was happening... Thank you!
We are facing exactly the same issue, but in our case we did not define any crawler or to that extent did not even touch Glue service directly.
We took the DDB connector and installed it as an application in the AWS account. Then went to Athena and defined a DataSource. That is it, and started using Athena to query DynamoDB. Now the schema for DynamoDB table has changes and Athena does not understand the newly added column and hence throws an error Column 'xxxyyyzzz' cannot be resolved
.
Is there a way to refresh the connector or tell the connector and to refresh the DynamoDB schema that it possess ?
@fmcastro
This pull request has been merged and should address your set type request: https://github.com/awslabs/aws-athena-query-federation/pull/728
We don't currently have plans to cut a new release with this yet, but just wanted to give you a heads up in case you wanted to build the connector with this update yourselves.
@fmcastro
We finally cut a release (2022.30.2) with this fix. Try it out and let us know if you have any problems.
Thanks for letting me know!
@RajasGujarathi - the connector automatically infers the schema by scanning a subset of items in the DDB table. You will need to follow the advice provided by Stuart in https://dev.to/aws-builders/finally-dynamodb-support-in-aws-quicksight-sort-of-2lbl
Thanks, @gfang-work for the reference, I will give it a try.
For better/easy reachability
Stuart advice is present in the comments of the article
Hi! I have a question that's very similar to https://github.com/awslabs/aws-athena-query-federation/issues/182, but that one was closed due to it being a separate issue.
Basically, I have a DynamoDB table called
evt_development
that contains different types of records. I've set up the DynamoDB connector and I can, for the most part, successfully query the DynamoDB table using Athena.However, there are certain fields that, when I try to query them, return messages like
Column 'converted_low' cannot be resolved
. I know this has to do with schema inference, so I've created a Glue crawler and used it to get the metadata for the DynamoDB table. I've confirmed that the metadata contains the fieldconverted_low
.So this is my setup:
evt_development
dynamo-db-catalog
default
evt_development
default
evt_development
After getting all that set up, I'm still receiving the same
Column 'converted_low' cannot be resolved
error message. My question is, how do I get the Athena query to use the Glue table metadata when querying DynamoDB?For reference, this is the query that I'm running:
Thank you!