apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.4k stars 2.43k forks source link

[SUPPORT] Compaction - Could not find - /opt/demo/config/schema.avsc - schema file #11892

Closed alberttwong closed 1 week ago

alberttwong commented 1 month ago

Cannot find file Could not find - file:///opt/demo/config/schema.avsc - schema file

To Reproduce

Steps to reproduce the behavior:

compaction run --compactionInstant  20240905045740967 --parallelism 2 --sparkMemory 1G  --schemaFilePath /opt/demo/config/schema.avsc --retry 1
392 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20240905045740967__compaction__REQUESTED__20240905045745721]}
565395 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_hoodie_DOT_write_DOT_lock_DOT_provider
565399 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_endpoint
565399 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_FS_ATOMIC_CREATION_SUPPORT
565399 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_access_DOT_key
565399 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider
565399 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_fs_DOT_s3a_DOT_secret_DOT_key
565399 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_hoodie_DOT_fs_DOT_atomic_creation_DOT_support
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HadoopFSUtils: Picking up value for hoodie env var : HOODIE_ENV_HOODIE_FS_ATOMIC_CREATION_SUPPORT
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HoodieCompactor: HoodieCompactorConfig {
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --base-path s3a://warehouse/stock_ticks_mor, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --table-name stock_ticks_mor, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --instant-time 20240905045740967, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --parallelism 2, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --schema-file file:///opt/demo/config/schema.avsc, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --spark-master null, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --spark-memory null, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --retry 0, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --schedule false, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --mode execute, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --strategy org.apache.hudi.table.action.compact.strategy.UnBoundedCompactionStrategy, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --props null, 
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -    --hoodie-conf []
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - }
565400 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO HoodieCompactor: Running Mode: [execute]; Do compaction
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 ERROR UtilHelpers: Compact failed
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - java.lang.Exception: Could not find - file:///opt/demo/config/schema.avsc - schema file.
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.UtilHelpers.parseSchema(UtilHelpers.java:301)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:251)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.HoodieCompactor.lambda$compact$0(HoodieCompactor.java:209)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.UtilHelpers.retry(UtilHelpers.java:621)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.HoodieCompactor.compact(HoodieCompactor.java:192)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.cli.commands.SparkMain.compact(SparkMain.java:366)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.cli.commands.SparkMain.main(SparkMain.java:168)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at java.lang.reflect.Method.invoke(Method.java:498)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 ERROR SparkMain: Fail to execute commandString
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - java.lang.RuntimeException: Failed in retry
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.UtilHelpers.retry(UtilHelpers.java:625)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.HoodieCompactor.compact(HoodieCompactor.java:192)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.cli.commands.SparkMain.compact(SparkMain.java:366)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.cli.commands.SparkMain.main(SparkMain.java:168)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at java.lang.reflect.Method.invoke(Method.java:498)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - Caused by: java.lang.Exception: Could not find - file:///opt/demo/config/schema.avsc - schema file.
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.UtilHelpers.parseSchema(UtilHelpers.java:301)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.HoodieCompactor.doCompact(HoodieCompactor.java:251)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.HoodieCompactor.lambda$compact$0(HoodieCompactor.java:209)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     at org.apache.hudi.utilities.UtilHelpers.retry(UtilHelpers.java:621)
565404 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] -     ... 15 more
565405 [Thread-17] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - 24/09/05 05:06:50 INFO SparkContext: SparkContext is stopping with exitCode 0.

Expected behavior

it should just work Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

alberttwong commented 1 month ago

tried file:///opt/demo/config/schema.avsc and it didn't work

alberttwong commented 1 month ago

only way I could get it working. upload the avsc file into s3.

compaction run --compactionInstant  20240906213724132 --parallelism 2 --sparkMemory 1G  --schemaFilePath s3://warehouse/schema.avsc --retry 1
ad1happy2go commented 1 week ago

@alberttwong Closing this, as normally we use object store or hdfs only to use it.