apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.38k stars 2.42k forks source link

[SUPPORT] Integ tests are failing for HUDI #4621

Closed data-storyteller closed 2 years ago

data-storyteller commented 2 years ago

Tips before filing an issue

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context Running the integ test on docker setup. The tests are failing with following stacktrace. Command - docker exec -i adhoc-2 /bin/bash spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 --conf spark.task.cpus=1 --conf spark.executor.cores=1 --conf spark.task.maxFailures=100 --conf spark.memory.fraction=0.4 --conf spark.rdd.compress=true --conf spark.kryoserializer.buffer.max=2000m --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.memory.storageFraction=0.1 --conf spark.shuffle.service.enabled=true --conf spark.sql.hive.convertMetastoreParquet=false --conf spark.driver.maxResultSize=12g --conf spark.executor.heartbeatInterval=120s --conf spark.network.timeout=600s --conf spark.yarn.max.executor.failures=10 --conf spark.sql.catalogImplementation=hive --conf spark.driver.extraClassPath=/var/demo/jars/* --conf spark.executor.extraClassPath=/var/demo/jars/* --class org.apache.hudi.integ.testsuite.HoodieTestSuiteJob /opt/$HUDI_JAR_NAME --source-ordering-field test_suite_source_ordering_field --target-base-path /user/hive/warehouse/hudi-integ-test-suite/output --input-base-path /user/hive/warehouse/hudi-integ-test-suite/input --target-table table1 --props file:/var/hoodie/ws/docker/demo/config/test-suite/$PROP_FILE --schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider --source-class org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 125829120 --workload-yaml-path file:/var/hoodie/ws/docker/demo/config/test-suite/$YAML_NAME --workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator --table-type $TABLE_TYPE --compact-scheduling-minshare 1 $EXTRA_SPARK_ARGS --clean-input --clean-output

Stacktrace


22/01/17 06:36:13 INFO DagNode: Configs : {"name":"a89cea37-7224-4f36-8c00-90306ddf6172","record_size":1000,"repeat_count":1,"num_partitions_insert":1,"num_records_insert":300,"config":"third_insert"}
--
17732 | 22/01/17 06:36:13 INFO DagNode: Inserting input data a89cea37-7224-4f36-8c00-90306ddf6172
17733 | 22/01/17 06:36:13 INFO HoodieTestSuiteJob: Using DFSTestSuitePathSelector, checkpoint: Option{val=2} sourceLimit: 9223372036854775807 lastBatchId: 2 nextBatchId: 3
17734 | 00:09  WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled.  Falling back to direct markers.
17735 | 00:10  WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled.  Falling back to direct markers.
17736 | 00:12  WARN: Timeline-server-based markers are configured as the marker type but embedded timeline server is not enabled.  Falling back to direct markers.
17737 | 22/01/17 06:36:16 INFO DagScheduler: Finished executing a89cea37-7224-4f36-8c00-90306ddf6172
17738 | 22/01/17 06:36:16 WARN DagScheduler: Executing node "first_hive_sync" :: {"queue_name":"adhoc","engine":"mr","name":"994a5035-0362-4c9a-a7d7-e47397f2b113","config":"first_hive_sync"}
17739 | 22/01/17 06:36:16 INFO DagNode: Executing hive sync node
17740 | 22/01/17 06:36:19 INFO DagScheduler: Finished executing 994a5035-0362-4c9a-a7d7-e47397f2b113
17741 | 22/01/17 06:36:19 WARN DagScheduler: Executing node "first_validate" :: {"name":"3f562e32-b7d8-4d96-a977-44b6b876c333","validate_hive":false,"config":"first_validate"}
17742 | 22/01/17 06:36:19 WARN DagNode: Validation using data from input path /user/hive/warehouse/hudi-integ-test-suite/input/*/*
17743 | 22/01/17 06:36:21 INFO ValidateDatasetNode: Validate data in target hudi path /user/hive/warehouse/hudi-integ-test-suite/output/*/*/*
17744 | 22/01/17 06:36:21 ERROR DagScheduler: Exception executing node
17745 | java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
17746 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
17747 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
17748 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
17749 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
17750 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
17751 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
17752 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
17753 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
17754 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
17755 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
17756 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
17757 | at java.lang.Thread.run(Thread.java:748)
17758 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
17759 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
17760 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
17761 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
17762 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
17763 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
17764 | at scala.util.Try$.apply(Try.scala:192)
17765 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
17766 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
17767 | at scala.util.Try.orElse(Try.scala:84)
17768 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
17769 | ... 11 more
17770 | 22/01/17 06:36:21 INFO DagScheduler: Forcing shutdown of executor service, this might kill running tasks
17771 | 22/01/17 06:36:21 ERROR HoodieTestSuiteJob: Failed to run Test Suite
17772 | java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
17773 | at java.util.concurrent.FutureTask.report(FutureTask.java:122)
17774 | at java.util.concurrent.FutureTask.get(FutureTask.java:206)
17775 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
17776 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
17777 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203)
17778 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
17779 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
17780 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
17781 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
17782 | at java.lang.reflect.Method.invoke(Method.java:498)
17783 | at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
17784 | at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845)
17785 | at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
17786 | at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
17787 | at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
17788 | at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920)
17789 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
17790 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17791 | Caused by: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
17792 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:146)
17793 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
17794 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
17795 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
17796 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
17797 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
17798 | at java.lang.Thread.run(Thread.java:748)
17799 | Caused by: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
17800 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
17801 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
17802 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
17803 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
17804 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
17805 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
17806 | ... 6 more
17807 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
17808 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
17809 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
17810 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
17811 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
17812 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
17813 | at scala.util.Try$.apply(Try.scala:192)
17814 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
17815 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
17816 | at scala.util.Try.orElse(Try.scala:84)
17817 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
17818 | ... 11 more
17819 | Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed to run Test Suite
17820 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:208)
17821 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
17822 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
17823 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
17824 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
17825 | at java.lang.reflect.Method.invoke(Method.java:498)
17826 | at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
17827 | at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:845)
17828 | at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
17829 | at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
17830 | at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
17831 | at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:920)
17832 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
17833 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17834 | Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
17835 | at java.util.concurrent.FutureTask.report(FutureTask.java:122)
17836 | at java.util.concurrent.FutureTask.get(FutureTask.java:206)
17837 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.execute(DagScheduler.java:113)
17838 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.schedule(DagScheduler.java:68)
17839 | at org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:203)
17840 | ... 13 more
17841 | Caused by: org.apache.hudi.exception.HoodieException: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
17842 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:146)
17843 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.lambda$execute$0(DagScheduler.java:105)
17844 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
17845 | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
17846 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
17847 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
17848 | at java.lang.Thread.run(Thread.java:748)
17849 | Caused by: java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
17850 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
17851 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
17852 | at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
17853 | at org.apache.hudi.integ.testsuite.dag.nodes.ValidateDatasetNode.getDatasetToValidate(ValidateDatasetNode.java:52)
17854 | at org.apache.hudi.integ.testsuite.dag.nodes.BaseValidateDatasetNode.execute(BaseValidateDatasetNode.java:99)
17855 | at org.apache.hudi.integ.testsuite.dag.scheduler.DagScheduler.executeNode(DagScheduler.java:139)
17856 | ... 6 more
17857 | Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
17858 | at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
17859 | at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
17860 | at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
17861 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
17862 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20$anonfun$apply$12.apply(DataSource.scala:634)
17863 | at scala.util.Try$.apply(Try.scala:192)
17864 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
17865 | at org.apache.spark.sql.execution.datasources.DataSource$anonfun$20.apply(DataSource.scala:634)
17866 | at scala.util.Try.orElse(Try.scala:84)
17867 | at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
17868 | ... 11 more
17869 |  
17870 | [Container] 2022/01/17 06:36:22 Command did not exit successfully sh run-intig-test.sh 2022-01-17 MERGE_ON_READ cow-long-running-example.yaml exit status 1
xushiyan commented 2 years ago

moving to https://issues.apache.org/jira/browse/HUDI-3262 for work tracking

nsivabalan commented 2 years ago

@data-storyteller : I tested integ test bundle in latest master and is all good. I have attached logs in the linked jira. is your env spark2 or spark3?

xushiyan commented 2 years ago

@data-storyteller @nsivabalan i see this param --packages org.apache.spark:spark-avro_2.11:2.4.0 while the spark version mentioned is 2.4.7. @data-storyteller can you align these versions? should be both 2.4.7

nsivabalan commented 2 years ago

I could able to reproduce w/ latest master. will investigate further.

data-storyteller commented 2 years ago

Thanks @nsivabalan for fix. This issue is resolved now.