Closed skamalj closed 10 months ago
@JCZuurmond would you mind taking a look at this one? I don't have an easy way to try to reproduce :/
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Is this a new bug in dbt-spark?
Current Behavior
When using 'merge' strategy with 'hudi' , initial load , or first run, works ok. The subsequent runs for incremental load fails with below error.
org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: cannot resolve _hoodie_commit_time in INSERT clause given columns DBT_INTERNAL_SOURCE.id, DBT_INTERNAL_SOURCE.firstname, DBT_INTERNAL_SOURCE.lastname, DBT_INTERNAL_SOURCE.phone, DBT_INTERNAL_SOURCE.email, DBT_INTERNAL_SOURCE.pincode, DBT_INTERNAL_SOURCE.joiningdate, DBT_INTERNAL_SOURCE.eventtime, DBT_INTERNAL_SOURCE.dept; line 15 pos 2 07:51:10 at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:43) 07:51:10 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:325) 07:51:10 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:230) 07:51:10 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 07:51:10 at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79) 07:51:10 at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63) 07:51:10 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) 07:51:10 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:230) 07:51:10 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:225) 07:51:10 at java.security.AccessController.doPrivileged(Native Method) 07:51:10 at javax.security.auth.Subject.doAs(Subject.java:422) 07:51:10 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) 07:51:10 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:239) 07:51:10 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 07:51:10 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 07:51:10 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 07:51:10 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 07:51:10 at java.lang.Thread.run(Thread.java:750) 07:51:10 Caused by: org.apache.spark.sql.AnalysisException: cannot resolve _hoodie_commit_time in INSERT clause given columns DBT_INTERNAL_SOURCE.id, DBT_INTERNAL_SOURCE.firstname, DBT_INTERNAL_SOURCE.lastname, DBT_INTERNAL_SOURCE.phone, DBT_INTERNAL_SOURCE.email, DBT_INTERNAL_SOURCE.pincode, DBT_INTERNAL_SOURCE.joiningdate, DBT_INTERNAL_SOURCE.eventtime, DBT_INTERNAL_SOURCE.dept; line 15 pos 2
My source format is 'json' hence will not have additional meta columns.
Expected Behavior
Subsequent loads with source other than 'hudi; and destination 'hudi' should execute without error.
Steps To Reproduce
Failure is at step 4.
Relevant log output
Environment
Additional Context
I am running this on AWS with storage as S3 with EMR for compute