aws-samples / aws-glue-samples

AWS Glue code samples
MIT No Attribution
1.44k stars 820 forks source link

Issue with Direct Migration from Glue Catalog to Hive Metastore #43

Open sbellary-chc opened 5 years ago

sbellary-chc commented 5 years ago

We have a Glue catalog in our dev aws account and now i am trying to migrate this Glue Catalog to Hive Metastore of an EMR Cluster (I need to do this to replace Hive Metastore content with Glue Catalog metadata so that I can track our Glue Catalog Data lineage using Apache Atlas installed on the EMR cluster).

I followed all the steps and procedures to Directly Migrate Glue Catalog to Hive Metastore but i am getting the _Duplicate entry 'default' for key 'UNIQUEDATABASE' error everytime and i have tried various iterations but still keep getting the same errors when i run the Glue ETL job. See the full error message below:

py4j.protocol.Py4JJavaError: An error occurred while calling o1025.jdbc.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 77.0 failed 4 times, most recent failure: Lost task 1.3 in stage 77.0 (TID 1298, ip-00-00-000-000.ec2.internal, executor 32): java.sql.BatchUpdateException: Duplicate entry 'default' for key 'UNIQUE_DATABASE'
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:642)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:783)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:783)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry 'default' for key 'UNIQUE_DATABASE'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2141)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1773)

So as per the migration procedure, we need to provide "--database-names" parameter values as key pairs so I tried specifying the list of databases separated by a semi-colon(;) first and also tried specifying just one database like "default" but none of this works. The above error was thrown when I used just the default database.

Is anyone familiar with this error? is there any work around to this issue? Please let me know if I am missing something. Any help is appreciated.

har5havardhan commented 5 years ago

hi, i am facing the same issue. did you manage to find any work around for this?

jainanuj07 commented 4 years ago

Hi I am facing same issue . Is anyone able to find the solution for this and able to successfully migrate from glue to hive mysql metastore.