awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
649 stars 305 forks source link

ANTLR tool version mismatch error when updating Glue Catalog #63

Open PaulBurridge opened 4 years ago

PaulBurridge commented 4 years ago

Trying to update Glue catalog via local Glue dev environment using example code from https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html

sink = glueContext.getSink(connection_type="s3", path="<S3_output_path>",
                           enableUpdateCatalog=True,
                           partitionKeys=["region", "year", "month", "day"])
sink.setFormat("json")
sink.setCatalogInfo(catalogDatabase=<target_db_name>, catalogTableName=<target_table_name>)
sink.writeFrame(last_transform)

Getting the following error

ANTLR Tool version 4.3 used for code generation does not match the current runtime version 4.7.220/07/25 19:09:21 INFO DataSink: Table input_mixed_types already exists in database glue_lab with catalogId of 
20/07/25 19:09:21 INFO DataSink: Failed to retrieve created table input_mixed_types in database glue_lab after job run with catalogId 
20/07/25 19:09:21 INFO DataSink: org.antlr.v4.runtime.misc.ParseCancellationException: line 1:0 no viable alternative at input 'long'
org.antlr.v4.runtime.misc.ParseCancellationException: line 1:0 no viable alternative at input 'long'
        at com.amazonaws.services.glue.schema.io.ThrowingErrorListener.syntaxError(ThrowingErrorListener.java:15)
        at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
        at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
        at org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:310)
        at org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:136)
        at com.amazonaws.services.glue.schema.io.grammar.HiveSchemaParser.dataType(HiveSchemaParser.java:186)
        at com.amazonaws.services.glue.schema.io.HiveFormatDeserializer.deserializeDataType(HiveFormatDeserializer.java:52)
        at com.amazonaws.services.glue.schema.io.HiveFormatDeserializer.deserializeDataTypeFromString(HiveFormatDeserializer.java:63)
        at com.amazonaws.services.glue.util.DataCatalogWrapperUtils$$anonfun$getFieldsFromColumns$1.apply(DataCatalogWrapper.scala:241)
        at com.amazonaws.services.glue.util.DataCatalogWrapperUtils$$anonfun$getFieldsFromColumns$1.apply(DataCatalogWrapper.scala:240)
        at scala.collection.immutable.List.map(List.scala:278)
        at com.amazonaws.services.glue.util.DataCatalogWrapperUtils$class.getFieldsFromColumns(DataCatalogWrapper.scala:240)
        at com.amazonaws.services.glue.util.DataCatalogWrapper.getFieldsFromColumns(DataCatalogWrapper.scala:94)
        at com.amazonaws.services.glue.util.DataCatalogWrapperUtils$class.getSchema(DataCatalogWrapper.scala:245)
        at com.amazonaws.services.glue.util.DataCatalogWrapper.getSchema(DataCatalogWrapper.scala:94)
        at com.amazonaws.services.glue.util.DataCatalogWrapperUtils$class.catalogTableFromGlueTable(DataCatalogWrapper.scala:482)
        at com.amazonaws.services.glue.util.DataCatalogWrapper.catalogTableFromGlueTable(DataCatalogWrapper.scala:94)
        at com.amazonaws.services.glue.util.DataCatalogWrapper$$anonfun$1.apply(DataCatalogWrapper.scala:102)
        at com.amazonaws.services.glue.util.DataCatalogWrapper$$anonfun$1.apply(DataCatalogWrapper.scala:97)
        at scala.util.Try$.apply(Try.scala:191)
        at com.amazonaws.services.glue.util.DataCatalogWrapper.getTable(DataCatalogWrapper.scala:97)
        at com.amazonaws.services.glue.DataSink$$anonfun$1.apply$mcV$sp(DataSink.scala:412)
        at com.amazonaws.services.glue.DataSink$$anonfun$1.apply(DataSink.scala:412)
        at com.amazonaws.services.glue.DataSink$$anonfun$1.apply(DataSink.scala:412)
        at scala.util.Try$.apply(Try.scala:191)
        at com.amazonaws.services.glue.DataSink$.getCatalogTableWithSinkElseCreateTable(DataSink.scala:411)
        at com.amazonaws.services.glue.DataSink$.forwardPotentialDynamicFrameToCatalog(DataSink.scala:207)
        at com.amazonaws.services.glue.DataSink$.forwardPotentialDynamicFrameToCatalog(DataSink.scala:167)
        at com.amazonaws.services.glue.sinks.HadoopDataSink$$anonfun$writeDynamicFrame$1.apply(HadoopDataSink.scala:237)
        at com.amazonaws.services.glue.sinks.HadoopDataSink$$anonfun$writeDynamicFrame$1.apply(HadoopDataSink.scala:141)
        at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:63)
        at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:63)
        at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:57)
        at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:63)
        at com.amazonaws.services.glue.sinks.HadoopDataSink.writeDynamicFrame(HadoopDataSink.scala:140)
        at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:52)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
  File "/Users/paulburridge/Projects/Glue/glue_script_ingestion.py", line 115, in <module>
    sink.writeFrame(subset_df)
  File "/Users/paulburridge/Projects/Glue/aws-glue-libs/PyGlue.zip/awsglue/data_sink.py", line 31, in writeFrame
  File "/Users/paulburridge/Projects/Glue/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/Users/paulburridge/Projects/Glue/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/Users/paulburridge/Projects/Glue/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o48.pyWriteDynamicFrame.
: scala.MatchError: (null,false) (of class scala.Tuple2)
        at com.amazonaws.services.glue.DataSink$.forwardPotentialDynamicFrameToCatalog(DataSink.scala:207)
        at com.amazonaws.services.glue.DataSink$.forwardPotentialDynamicFrameToCatalog(DataSink.scala:167)
        at com.amazonaws.services.glue.sinks.HadoopDataSink$$anonfun$writeDynamicFrame$1.apply(HadoopDataSink.scala:237)
        at com.amazonaws.services.glue.sinks.HadoopDataSink$$anonfun$writeDynamicFrame$1.apply(HadoopDataSink.scala:141)
        at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:63)
        at com.amazonaws.services.glue.util.FileSchemeWrapper$$anonfun$executeWithQualifiedScheme$1.apply(FileSchemeWrapper.scala:63)
        at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWith(FileSchemeWrapper.scala:57)
        at com.amazonaws.services.glue.util.FileSchemeWrapper.executeWithQualifiedScheme(FileSchemeWrapper.scala:63)
        at com.amazonaws.services.glue.sinks.HadoopDataSink.writeDynamicFrame(HadoopDataSink.scala:140)
        at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:52)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
calleo commented 4 years ago

I think this is a known bug (ran into it myself). If you make sure not to use type long in your schema it works. I reported this to AWS some weeks ago an supposedly there is a fix being worked on.