MichaelMt66 / open-source-lakehouse

3 stars 3 forks source link

issue running spark submit #8

Closed alberttwong closed 10 months ago

alberttwong commented 10 months ago
root@111daf9116db:/spark-app# /opt/spark/bin/spark-submit \
>     --master spark://spark-master:7077 \
>     --driver-memory 1G \
>     --executor-memory 1G \
>     --conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
>     --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider \
>     --conf spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY \
>     --conf spark.hadoop.fs.s3a.buffer.dir=/tmp/spark-data-tmp \
>     --conf spark.hadoop.fs.s3a.fast.upload.buffer=bytebuffer \
>     --conf spark.hive.metastore.uris=thrift://hive-metastore:9083 \
>     --jars postgresql-42.5.4.jar,hadoop-aws-3.2.0.jar,aws-java-sdk-bundle-1.11.375.jar,hudi-spark3.1-bundle_2.12-0.12.2.jar,hudi-utilities-slim-bundle_2.12-0.12.2.jar,hudi-hive-sync-bundle-0.12.2.jar \
>     --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer hudi-utilities-slim-bundle_2.12-0.12.2.jar \
>     --source-class org.apache.hudi.utilities.sources.JdbcSource \
>     --target-base-path s3a://$HUDI_S3_BUCKET/data/${DATABASE}/${TABLE} \
>     --target-table ${TABLE}  \
>     --source-ordering-field ${PRE_COMBINE} \
>     --table-type COPY_ON_WRITE \
>     --hoodie-conf hoodie.deltastreamer.jdbc.url=jdbc:postgresql://postgres-database:5432/dev_database \
>     --hoodie-conf hoodie.deltastreamer.jdbc.user=deuser \
>     --hoodie-conf hoodie.deltastreamer.jdbc.password=depasswd \
>     --hoodie-conf hoodie.deltastreamer.jdbc.driver.class=org.postgresql.Driver \
>     --hoodie-conf hoodie.deltastreamer.jdbc.table.name=${DATABASE}.${TABLE}\
>     --hoodie-conf hoodie.deltastreamer.jdbc.incr.pull=TRUE \
>     --hoodie-conf hoodie.deltastreamer.jdbc.table.incr.column.name=${PRE_COMBINE} \
>     --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator \
>     --hoodie-conf hoodie.datasource.write.recordkey.field=${PRIMARY_KEY} \
>     --enable-hive-sync \
>     --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor \
>     --hoodie-conf hoodie.datasource.hive_sync.mode=hms \
>     --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \
>     --hoodie-conf hoodie.datasource.hive_sync.database=${DATABASE} \
>     --hoodie-conf hoodie.datasource.hive_sync.table=${TABLE} \
>     --op UPSERT
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
24/02/02 01:18:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.hudi.utilities.deltastreamer.SchedulerConfGenerator).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
24/02/02 01:18:22 INFO SparkContext: Running Spark version 3.1.2
24/02/02 01:18:22 INFO ResourceUtils: ==============================================================
24/02/02 01:18:22 INFO ResourceUtils: No custom resources configured for spark.driver.
24/02/02 01:18:22 INFO ResourceUtils: ==============================================================
24/02/02 01:18:22 INFO SparkContext: Submitted application: delta-streamer---source-ordering-field
24/02/02 01:18:22 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
24/02/02 01:18:22 INFO ResourceProfile: Limiting resource is cpu
24/02/02 01:18:22 INFO ResourceProfileManager: Added ResourceProfile id: 0
24/02/02 01:18:22 INFO SecurityManager: Changing view acls to: root
24/02/02 01:18:22 INFO SecurityManager: Changing modify acls to: root
24/02/02 01:18:22 INFO SecurityManager: Changing view acls groups to: 
24/02/02 01:18:22 INFO SecurityManager: Changing modify acls groups to: 
24/02/02 01:18:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
24/02/02 01:18:22 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
24/02/02 01:18:22 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
24/02/02 01:18:22 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
24/02/02 01:18:23 INFO Utils: Successfully started service 'sparkDriver' on port 43061.
24/02/02 01:18:23 INFO SparkEnv: Registering MapOutputTracker
24/02/02 01:18:23 INFO SparkEnv: Registering BlockManagerMaster
24/02/02 01:18:23 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
24/02/02 01:18:23 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
24/02/02 01:18:23 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
24/02/02 01:18:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7efb07be-78e1-4520-9278-9a63af13534b
24/02/02 01:18:23 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
24/02/02 01:18:23 INFO SparkEnv: Registering OutputCommitCoordinator
24/02/02 01:18:23 INFO Utils: Successfully started service 'SparkUI' on port 8090.
24/02/02 01:18:23 INFO SparkUI: Bound SparkUI to spark-master, and started at http://111daf9116db:8090
24/02/02 01:18:23 INFO SparkContext: Added JAR file:///spark-app/postgresql-42.5.4.jar at spark://111daf9116db:43061/jars/postgresql-42.5.4.jar with timestamp 1706836702696
24/02/02 01:18:23 INFO SparkContext: Added JAR file:///spark-app/hadoop-aws-3.2.0.jar at spark://111daf9116db:43061/jars/hadoop-aws-3.2.0.jar with timestamp 1706836702696
24/02/02 01:18:23 INFO SparkContext: Added JAR file:///spark-app/aws-java-sdk-bundle-1.11.375.jar at spark://111daf9116db:43061/jars/aws-java-sdk-bundle-1.11.375.jar with timestamp 1706836702696
24/02/02 01:18:23 INFO SparkContext: Added JAR file:///spark-app/hudi-spark3.1-bundle_2.12-0.12.2.jar at spark://111daf9116db:43061/jars/hudi-spark3.1-bundle_2.12-0.12.2.jar with timestamp 1706836702696
24/02/02 01:18:23 INFO SparkContext: Added JAR file:///spark-app/hudi-utilities-slim-bundle_2.12-0.12.2.jar at spark://111daf9116db:43061/jars/hudi-utilities-slim-bundle_2.12-0.12.2.jar with timestamp 1706836702696
24/02/02 01:18:23 INFO SparkContext: Added JAR file:///spark-app/hudi-hive-sync-bundle-0.12.2.jar at spark://111daf9116db:43061/jars/hudi-hive-sync-bundle-0.12.2.jar with timestamp 1706836702696
24/02/02 01:18:23 WARN SparkContext: The jar file:/spark-app/hudi-utilities-slim-bundle_2.12-0.12.2.jar has been added already. Overwriting of added jars is not supported in the current version.
24/02/02 01:18:23 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
24/02/02 01:18:23 INFO TransportClientFactory: Successfully created connection to spark-master/172.20.0.4:7077 after 28 ms (0 ms spent in bootstraps)
24/02/02 01:18:23 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20240202011823-0001
24/02/02 01:18:23 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240202011823-0001/0 on worker-20240202005215-172.20.0.5-7000 (172.20.0.5:7000) with 1 core(s)
24/02/02 01:18:23 INFO StandaloneSchedulerBackend: Granted executor ID app-20240202011823-0001/0 on hostPort 172.20.0.5:7000 with 1 core(s), 1024.0 MiB RAM
24/02/02 01:18:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33079.
24/02/02 01:18:23 INFO NettyBlockTransferService: Server created on 111daf9116db:33079
24/02/02 01:18:23 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
24/02/02 01:18:23 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 111daf9116db, 33079, None)
24/02/02 01:18:23 INFO BlockManagerMasterEndpoint: Registering block manager 111daf9116db:33079 with 434.4 MiB RAM, BlockManagerId(driver, 111daf9116db, 33079, None)
24/02/02 01:18:23 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 111daf9116db, 33079, None)
24/02/02 01:18:23 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 111daf9116db, 33079, None)
24/02/02 01:18:23 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240202011823-0001/0 is now RUNNING
24/02/02 01:18:24 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
24/02/02 01:18:24 WARN HoodieDeltaStreamer: --enable-hive-sync will be deprecated in a future release; please use --enable-sync instead for Hive syncing
24/02/02 01:18:24 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
24/02/02 01:18:24 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
24/02/02 01:18:24 INFO MetricsSystemImpl: s3a-file-system metrics system started
24/02/02 01:18:26 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
24/02/02 01:18:26 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
24/02/02 01:18:26 INFO UtilHelpers: Adding overridden properties to file properties.
24/02/02 01:18:26 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
24/02/02 01:18:26 INFO HoodieDeltaStreamer: Creating delta streamer with configs:
hoodie.auto.adjust.lock.configs: true
hoodie.datasource.hive_sync.database: 
hoodie.datasource.hive_sync.mode: hms
hoodie.datasource.hive_sync.partition_extractor_class: org.apache.hudi.hive.MultiPartKeysValueExtractor
hoodie.datasource.hive_sync.table: 
hoodie.datasource.write.hive_style_partitioning: true
hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.NonpartitionedKeyGenerator
hoodie.datasource.write.reconcile.schema: false
hoodie.datasource.write.recordkey.field: 
hoodie.deltastreamer.jdbc.driver.class: org.postgresql.Driver
hoodie.deltastreamer.jdbc.incr.pull: TRUE
hoodie.deltastreamer.jdbc.password: depasswd
hoodie.deltastreamer.jdbc.table.incr.column.name: 
hoodie.deltastreamer.jdbc.table.name: .
hoodie.deltastreamer.jdbc.url: jdbc:postgresql://postgres-database:5432/dev_database
hoodie.deltastreamer.jdbc.user: deuser

24/02/02 01:18:26 INFO HoodieTableMetaClient: Initializing s3a://albertatstarrocks/data// as hoodie table s3a://albertatstarrocks/data//
24/02/02 01:18:26 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.20.0.5:44114) with ID 0,  ResourceProfileId 0
24/02/02 01:18:26 INFO BlockManagerMasterEndpoint: Registering block manager 172.20.0.5:43239 with 434.4 MiB RAM, BlockManagerId(0, 172.20.0.5, 43239, None)
24/02/02 01:18:33 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://albertatstarrocks/data//
24/02/02 01:18:34 INFO HoodieTableConfig: Loading table properties from s3a://albertatstarrocks/data/.hoodie/hoodie.properties
24/02/02 01:18:34 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://albertatstarrocks/data//
24/02/02 01:18:34 INFO HoodieTableMetaClient: Finished initializing Table of type COPY_ON_WRITE from s3a://albertatstarrocks/data//
24/02/02 01:18:34 INFO HoodieDeltaStreamer: Delta Streamer running only single round
24/02/02 01:18:34 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://albertatstarrocks/data//
24/02/02 01:18:34 INFO HoodieTableConfig: Loading table properties from s3a://albertatstarrocks/data/.hoodie/hoodie.properties
24/02/02 01:18:34 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://albertatstarrocks/data//
24/02/02 01:18:34 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
24/02/02 01:18:34 INFO DeltaSync: Checkpoint to resume from : Optional.empty
24/02/02 01:18:34 INFO JdbcSource: No checkpoint references found. Doing a full rdbms table fetch
24/02/02 01:18:35 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/spark-app/spark-warehouse').
24/02/02 01:18:35 INFO SharedState: Warehouse path is 'file:/spark-app/spark-warehouse'.
24/02/02 01:18:35 INFO JdbcSource: Reading JDBC password from properties file....
24/02/02 01:18:35 ERROR JdbcSource: Exception while running JDBCSource 
org.postgresql.util.PSQLException: ERROR: syntax error at or near "."
  Position: 30
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
        at org.apache.hudi.utilities.sources.JdbcSource.fullFetch(JdbcSource.java:239)
        at org.apache.hudi.utilities.sources.JdbcSource.fetch(JdbcSource.java:176)
        at org.apache.hudi.utilities.sources.JdbcSource.fetchNextBatch(JdbcSource.java:152)
        at org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
        at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
        at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:498)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:431)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:335)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:206)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:204)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
24/02/02 01:18:35 ERROR HoodieDeltaStreamer: Got error running delta sync once. Shutting down
org.apache.hudi.exception.HoodieException: Error fetching next batch from JDBC source. Last checkpoint: null
        at org.apache.hudi.utilities.sources.JdbcSource.fetchNextBatch(JdbcSource.java:158)
        at org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
        at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
        at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:498)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:431)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:335)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:206)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:204)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near "."
  Position: 30
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
        at org.apache.hudi.utilities.sources.JdbcSource.fullFetch(JdbcSource.java:239)
        at org.apache.hudi.utilities.sources.JdbcSource.fetch(JdbcSource.java:176)
        at org.apache.hudi.utilities.sources.JdbcSource.fetchNextBatch(JdbcSource.java:152)
        ... 22 more
24/02/02 01:18:35 INFO DeltaSync: Shutting down embedded timeline server
24/02/02 01:18:35 INFO HoodieDeltaStreamer: Shut down delta streamer
24/02/02 01:18:35 INFO SparkUI: Stopped Spark web UI at http://111daf9116db:8090
24/02/02 01:18:35 INFO StandaloneSchedulerBackend: Shutting down all executors
24/02/02 01:18:35 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
24/02/02 01:18:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/02/02 01:18:35 INFO MemoryStore: MemoryStore cleared
24/02/02 01:18:35 INFO BlockManager: BlockManager stopped
24/02/02 01:18:35 INFO BlockManagerMaster: BlockManagerMaster stopped
24/02/02 01:18:35 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/02/02 01:18:35 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hudi.exception.HoodieException: Error fetching next batch from JDBC source. Last checkpoint: null
        at org.apache.hudi.utilities.sources.JdbcSource.fetchNextBatch(JdbcSource.java:158)
        at org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
        at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
        at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:69)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:498)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:431)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:335)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:206)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:204)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near "."
  Position: 30
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
        at org.apache.hudi.utilities.sources.JdbcSource.fullFetch(JdbcSource.java:239)
        at org.apache.hudi.utilities.sources.JdbcSource.fetch(JdbcSource.java:176)
        at org.apache.hudi.utilities.sources.JdbcSource.fetchNextBatch(JdbcSource.java:152)
        ... 22 more
24/02/02 01:18:35 INFO ShutdownHookManager: Shutdown hook called
24/02/02 01:18:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-f2d5a40a-878b-454e-bf0d-7b25ab496598
24/02/02 01:18:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-a544d190-d4af-480e-aa9c-0d7df29bfabb
24/02/02 01:18:35 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system...
24/02/02 01:18:35 INFO MetricsSystemImpl: s3a-file-system metrics system stopped.
24/02/02 01:18:35 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete.
alberttwong commented 10 months ago

so I found out that you shouldn't run spark submit. It seemed to imply it. You just need to run the load_sources.sh