apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
835 stars 137 forks source link

Enable Direct Writing to OneTable Format from Hudi Delta Streamer #351

Closed soumilshah1995 closed 6 months ago

soumilshah1995 commented 6 months ago

Opened this on Hudi Side and OneTable as I am not sure which is best place to open this ip

HUDI GH https://github.com/apache/hudi/issues/10784

Description

As a user of Hudi Delta Streamer, I appreciate the concept of OneTable, but I find it cumbersome to run a separate job for transaction processing. I'm seeking guidance on whether there's a way to directly write data from Delta Streamer into OneTable format without the need for additional jobs.

Details

I've explored the available documentation but haven't found a comprehensive guide on achieving this. I'm eager to experiment with this functionality and share my findings with the community through videos and blogs.

Questions

Is there a method to configure Hudi Delta Streamer to write data directly into OneTable format? Could you provide guidance or point me to relevant documentation for implementing this functionality? Additionally, I'm looking for the hudi-extensions-0.1.0-SNAPSHOT-bundled JAR. Is this available on Maven repositories, and if so, under which coordinates?

Spark Submit Configuration (Working with Hudi)

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0'\
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --op UPSERT \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --props hudi_tbl.props

image

Desired Changes I'm looking for guidance on what modifications are necessary in the above Spark Submit configuration to enable writing into OneTable format directly.

References Syncing to OneTable Format - Hudi Documentation

the-other-tim-brown commented 6 months ago

@soumilshah1995 Just to clarify a common misconception, OneTable (now XTable) is not a format so no you cannot write to this as a format but you are able to sync your Hudi table to Delta and Iceberg with this tool. You can follow the instructions here for adding a meta sync to your HoodieStreamer job. You'll need to package the jar and put it on your classpath for now to get the latest updates. Then you can add this class to your --sync-tool-classes and update your props file with the required args for the sync tool (item 3 in the readme linked).

After we go through some renaming and interface improvements, we plan on publishing these jars.

soumilshah1995 commented 6 months ago

Thanks for the clarification. I understand now that I can't achieve the table format directly. What I meant to convey is that I'd like to synchronize data into all three formats (Hudi, Delta, and Iceberg) simultaneously during write operations from the Hudi streamer.

I'm not familiar with Java or Maven, so I'm unsure about packaging the JAR. Could you assist me with creating the JAR file for Spark 3.4?

To clarify further, my goal is to sync all three tables (Hudi, Delta, and Iceberg) with every consecutive write operation via the delta streamer. I'm hoping to avoid running separate processes for each format.

Regarding the JAR execution, would the following command suffice?

java -jar ./jar/utilities-0.1.0-beta1-bundled.jar --dataset ./my_config.yaml

Please confirm if achieving this synchronization within the Hudi Streamer is feasible, and if so, how I can properly pass the JAR and what changes or medication to below spark-submit job is required

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0'\
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --op UPSERT \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --props hudi_tbl.props
the-other-tim-brown commented 6 months ago

Here are some steps to follow:

  1. Build with mvn install -DskipTests. This will output a jar for you under hudi-support/extensions/target/hudi-extensions-0.1.0-SNAPSHOT-bundled.jar
  2. Add that jar to your spark submit command with --jars
  3. Update the command with --sync-tool-classes io.onetable.hudi.sync.OneTableSyncTool
  4. update hudi_tbl.props to include hoodie.onetable.formats: "ICEBERG,DELTA"

Now each time the HoodieStreamer commits, it should sync to the other two formats for you so you don't have to run that other java -jar command.

If you want to query the data from BigQuery or Snowflake via Iceberg format, you'll need to add these properties as well:

hoodie.avro.write.support.class: io.onetable.hudi.extensions.HoodieAvroWriteSupportWithFieldIds
hoodie.client.init.callback.classes: io.onetable.hudi.extensions.AddFieldIdsClientInitCallback
hoodie.datasource.write.row.writer.enable : false

These will set an id for each field in the underlying parquet schema which Iceberg uses to track fields (instead of strictly names for column renaming).

soumilshah1995 commented 6 months ago

I will try it out in past I never got build working on Mac M2

https://github.com/apache/incubator-xtable/issues/241

I can give a try again for sure

by the way what is proposed java version needed ?

the-other-tim-brown commented 6 months ago

Java 11

soumilshah1995 commented 6 months ago

im getting build failed

soumilshah@Soumils-MBP incubator-xtable % mvn install -DskipTests
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for io.onetable:onetable-core:jar:0.1.0-SNAPSHOT
[WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must be unique: org.apache.hadoop:hadoop-common:jar -> duplicate declaration of version (?) @ line 118, column 21
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] onetable                                                           [pom]
[INFO] api                                                                [jar]
[INFO] hudi-support                                                       [pom]
[INFO] hudi-utils                                                         [jar]
[INFO] core                                                               [jar]
[INFO] utilities                                                          [jar]
[INFO] hudi-extensions                                                    [jar]
[INFO] 
[INFO] ------------------------< io.onetable:onetable >------------------------
[INFO] Building onetable 0.1.0-SNAPSHOT                                   [1/7]
[INFO]   from pom.xml
[INFO] --------------------------------[ pom ]---------------------------------
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/3.1.0/maven-enforcer-plugin-3.1.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/3.1.0/maven-enforcer-plugin-3.1.0.pom (7.2 kB at 17 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer/3.1.0/enforcer-3.1.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer/3.1.0/enforcer-3.1.0.pom (8.3 kB at 269 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/3.1.0/maven-enforcer-plugin-3.1.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-enforcer-plugin/3.1.0/maven-enforcer-plugin-3.1.0.jar (26 kB at 551 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-failsafe-plugin/2.22.2/maven-failsafe-plugin-2.22.2.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-failsafe-plugin/2.22.2/maven-failsafe-plugin-2.22.2.pom (12 kB at 424 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-failsafe-plugin/2.22.2/maven-failsafe-plugin-2.22.2.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-failsafe-plugin/2.22.2/maven-failsafe-plugin-2.22.2.jar (294 kB at 3.1 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-maven-plugin/2.27.2/spotless-maven-plugin-2.27.2.pom
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-maven-plugin/2.27.2/spotless-maven-plugin-2.27.2.pom (2.9 kB at 98 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-maven-plugin/2.27.2/spotless-maven-plugin-2.27.2.jar
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-maven-plugin/2.27.2/spotless-maven-plugin-2.27.2.jar (91 kB at 3.3 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-install-plugin/3.1.1/maven-install-plugin-3.1.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-install-plugin/3.1.1/maven-install-plugin-3.1.1.pom (7.8 kB at 300 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-plugins/39/maven-plugins-39.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-plugins/39/maven-plugins-39.pom (8.1 kB at 300 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-install-plugin/3.1.1/maven-install-plugin-3.1.1.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-install-plugin/3.1.1/maven-install-plugin-3.1.1.jar (31 kB at 947 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/projectlombok/lombok/1.18.30/lombok-1.18.30.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/projectlombok/lombok/1.18.30/lombok-1.18.30.pom (1.5 kB at 65 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/projectlombok/lombok/1.18.30/lombok-1.18.30.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/projectlombok/lombok/1.18.30/lombok-1.18.30.jar (2.0 MB at 14 MB/s)
[INFO] 
[INFO] --- enforcer:3.1.0:enforce (enforce-logging) @ onetable ---
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-api/3.1.0/enforcer-api-3.1.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-api/3.1.0/enforcer-api-3.1.0.pom (2.9 kB at 105 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-model/3.2.5/maven-model-3.2.5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-model/3.2.5/maven-model-3.2.5.pom (4.2 kB at 163 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact/3.2.5/maven-artifact-3.2.5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact/3.2.5/maven-artifact-3.2.5.pom (2.3 kB at 90 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/org.eclipse.sisu.plexus/0.3.5/org.eclipse.sisu.plexus-0.3.5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/org.eclipse.sisu.plexus/0.3.5/org.eclipse.sisu.plexus-0.3.5.pom (4.3 kB at 165 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/sisu-plexus/0.3.5/sisu-plexus-0.3.5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/sisu-plexus/0.3.5/sisu-plexus-0.3.5.pom (14 kB at 528 kB/s)
[INFO] Artifact javax.annotation:javax.annotation-api:pom:1.2 is present in the local repository, but cached from a remote repository ID that is unavailable in current build context, verifying that is downloadable from [central (https://repo.maven.apache.org/maven2, default, releases), apache.snapshots (https://repository.apache.org/snapshots, default, snapshots), maven-default-http-blocker (http://0.0.0.0/, default, snapshots, blocked), sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots, default, snapshots)]
[INFO] Artifact javax.annotation:javax.annotation-api:pom:1.2 is present in the local repository, but cached from a remote repository ID that is unavailable in current build context, verifying that is downloadable from [central (https://repo.maven.apache.org/maven2, default, releases), apache.snapshots (https://repository.apache.org/snapshots, default, snapshots), maven-default-http-blocker (http://0.0.0.0/, default, snapshots, blocked), sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots, default, snapshots)]
Downloading from central: https://repo.maven.apache.org/maven2/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.pom
Downloaded from central: https://repo.maven.apache.org/maven2/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.pom (0 B at 0 B/s)
Downloading from central: https://repo.maven.apache.org/maven2/javax/enterprise/cdi-api/1.2/cdi-api-1.2.pom
Downloaded from central: https://repo.maven.apache.org/maven2/javax/enterprise/cdi-api/1.2/cdi-api-1.2.pom (6.3 kB at 157 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/jboss/weld/weld-parent/26/weld-parent-26.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/jboss/weld/weld-parent/26/weld-parent-26.pom (32 kB at 1.0 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/org.eclipse.sisu.inject/0.3.5/org.eclipse.sisu.inject-0.3.5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/org.eclipse.sisu.inject/0.3.5/org.eclipse.sisu.inject-0.3.5.pom (2.6 kB at 85 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/sisu-inject/0.3.5/sisu-inject-0.3.5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/sisu-inject/0.3.5/sisu-inject-0.3.5.pom (14 kB at 335 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-component-annotations/2.1.1/plexus-component-annotations-2.1.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-component-annotations/2.1.1/plexus-component-annotations-2.1.1.pom (770 B at 18 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-containers/2.1.1/plexus-containers-2.1.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-containers/2.1.1/plexus-containers-2.1.1.pom (6.0 kB at 168 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus/6.5/plexus-6.5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus/6.5/plexus-6.5.pom (26 kB at 779 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-rules/3.1.0/enforcer-rules-3.1.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-rules/3.1.0/enforcer-rules-3.1.0.pom (4.9 kB at 164 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/aether/aether-util/1.1.0/aether-util-1.1.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/aether/aether-util/1.1.0/aether-util-1.1.0.pom (2.1 kB at 81 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/aether/aether/1.1.0/aether-1.1.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/aether/aether/1.1.0/aether-1.1.0.pom (26 kB at 993 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/3.3.4/maven-shared-utils-3.3.4.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/3.3.4/maven-shared-utils-3.3.4.pom (5.8 kB at 133 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-components/34/maven-shared-components-34.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-components/34/maven-shared-components-34.pom (5.1 kB at 243 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-parent/34/maven-parent-34.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-parent/34/maven-parent-34.pom (43 kB at 1.6 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-dependency-tree/3.1.1/maven-dependency-tree-3.1.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-dependency-tree/3.1.1/maven-dependency-tree-3.1.1.pom (6.5 kB at 123 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-compat/3.2.5/maven-compat-3.2.5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-compat/3.2.5/maven-compat-3.2.5.pom (4.2 kB at 110 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/wagon/wagon-provider-api/2.8/wagon-provider-api-2.8.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/wagon/wagon-provider-api/2.8/wagon-provider-api-2.8.pom (1.7 kB at 56 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/wagon/wagon/2.8/wagon-2.8.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/wagon/wagon/2.8/wagon-2.8.pom (19 kB at 728 kB/s)
[INFO] Artifact javax.annotation:javax.annotation-api:jar:1.2 is present in the local repository, but cached from a remote repository ID that is unavailable in current build context, verifying that is downloadable from [central (https://repo.maven.apache.org/maven2, default, releases), apache.snapshots (https://repository.apache.org/snapshots, default, snapshots), maven-default-http-blocker (http://0.0.0.0/, default, snapshots, blocked), sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots, default, snapshots)]
[INFO] Artifact javax.annotation:javax.annotation-api:jar:1.2 is present in the local repository, but cached from a remote repository ID that is unavailable in current build context, verifying that is downloadable from [central (https://repo.maven.apache.org/maven2, default, releases), apache.snapshots (https://repository.apache.org/snapshots, default, snapshots), maven-default-http-blocker (http://0.0.0.0/, default, snapshots, blocked), sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots, default, snapshots)]
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-api/3.1.0/enforcer-api-3.1.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-api/3.1.0/enforcer-api-3.1.0.jar (11 kB at 327 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-model/3.2.5/maven-model-3.2.5.jar
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/org.eclipse.sisu.plexus/0.3.5/org.eclipse.sisu.plexus-0.3.5.jar
Downloading from central: https://repo.maven.apache.org/maven2/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar
Downloading from central: https://repo.maven.apache.org/maven2/javax/enterprise/cdi-api/1.2/cdi-api-1.2.jar
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/org.eclipse.sisu.inject/0.3.5/org.eclipse.sisu.inject-0.3.5.jar
Downloaded from central: https://repo.maven.apache.org/maven2/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar (0 B at 0 B/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-rules/3.1.0/enforcer-rules-3.1.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/enforcer/enforcer-rules/3.1.0/enforcer-rules-3.1.0.jar (118 kB at 919 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact/3.2.5/maven-artifact-3.2.5.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-artifact/3.2.5/maven-artifact-3.2.5.jar (55 kB at 313 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-component-annotations/2.1.1/plexus-component-annotations-2.1.1.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-component-annotations/2.1.1/plexus-component-annotations-2.1.1.jar (4.1 kB at 20 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/aether/aether-util/1.1.0/aether-util-1.1.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/aether/aether-util/1.1.0/aether-util-1.1.0.jar (150 kB at 618 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/3.3.4/maven-shared-utils-3.3.4.jar
Downloaded from central: https://repo.maven.apache.org/maven2/javax/enterprise/cdi-api/1.2/cdi-api-1.2.jar (71 kB at 282 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/3.3.4/maven-shared-utils-3.3.4.jar (153 kB at 557 kB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/maven-model/3.2.5/maven-model-3.2.5.jar (161 kB at 568 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-dependency-tree/3.1.1/maven-dependency-tree-3.1.1.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/org.eclipse.sisu.inject/0.3.5/org.eclipse.sisu.inject-0.3.5.jar (379 kB at 1.3 MB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar (267 kB at 830 kB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-dependency-tree/3.1.1/maven-dependency-tree-3.1.1.jar (99 kB at 308 kB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/sisu/org.eclipse.sisu.plexus/0.3.5/org.eclipse.sisu.plexus-0.3.5.jar (205 kB at 636 kB/s)
[INFO] 
[INFO] --- failsafe:2.22.2:integration-test (default) @ onetable ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- failsafe:2.22.2:verify (default) @ onetable ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- spotless:2.27.2:check (default) @ onetable ---
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-lib/2.30.0/spotless-lib-2.30.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-lib/2.30.0/spotless-lib-2.30.0.pom (1.5 kB at 30 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-lib-extra/2.30.0/spotless-lib-extra-2.30.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-lib-extra/2.30.0/spotless-lib-extra-2.30.0.pom (2.7 kB at 83 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/durian/durian-core/1.2.0/durian-core-1.2.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/durian/durian-core/1.2.0/durian-core-1.2.0.pom (2.0 kB at 72 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/durian/durian-collect/1.2.0/durian-collect-1.2.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/durian/durian-collect/1.2.0/durian-collect-1.2.0.pom (2.1 kB at 102 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/jgit/org.eclipse.jgit/5.13.1.202206130422-r/org.eclipse.jgit-5.13.1.202206130422-r.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/jgit/org.eclipse.jgit/5.13.1.202206130422-r/org.eclipse.jgit-5.13.1.202206130422-r.pom (7.6 kB at 252 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/jgit/org.eclipse.jgit-parent/5.13.1.202206130422-r/org.eclipse.jgit-parent-5.13.1.202206130422-r.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/jgit/org.eclipse.jgit-parent/5.13.1.202206130422-r/org.eclipse.jgit-parent-5.13.1.202206130422-r.pom (34 kB at 1.3 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/googlecode/javaewah/JavaEWAH/1.1.13/JavaEWAH-1.1.13.pom
Downloaded from central: https://repo.maven.apache.org/maven2/com/googlecode/javaewah/JavaEWAH/1.1.13/JavaEWAH-1.1.13.pom (5.0 kB at 124 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/sonatype/oss/oss-parent/5/oss-parent-5.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/sonatype/oss/oss-parent/5/oss-parent-5.pom (4.1 kB at 93 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/googlecode/concurrent-trees/concurrent-trees/2.6.1/concurrent-trees-2.6.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/com/googlecode/concurrent-trees/concurrent-trees/2.6.1/concurrent-trees-2.6.1.pom (9.7 kB at 277 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-xml/3.0.10/groovy-xml-3.0.10.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-xml/3.0.10/groovy-xml-3.0.10.pom (22 kB at 609 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy/3.0.10/groovy-3.0.10.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy/3.0.10/groovy-3.0.10.pom (24 kB at 987 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.2.0/plexus-resources-1.2.0.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.2.0/plexus-resources-1.2.0.pom (4.4 kB at 158 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-components/6.6/plexus-components-6.6.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-components/6.6/plexus-components-6.6.pom (2.7 kB at 51 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.4.1/plexus-utils-3.4.1.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.4.1/plexus-utils-3.4.1.pom (8.0 kB at 295 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/1.7.32/slf4j-api-1.7.32.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/1.7.32/slf4j-api-1.7.32.pom (3.8 kB at 160 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/slf4j/slf4j-parent/1.7.32/slf4j-parent-1.7.32.pom
Downloaded from central: https://repo.maven.apache.org/maven2/org/slf4j/slf4j-parent/1.7.32/slf4j-parent-1.7.32.pom (14 kB at 307 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-lib/2.30.0/spotless-lib-2.30.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-lib/2.30.0/spotless-lib-2.30.0.jar (351 kB at 6.9 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-lib-extra/2.30.0/spotless-lib-extra-2.30.0.jar
Downloading from central: https://repo.maven.apache.org/maven2/com/googlecode/concurrent-trees/concurrent-trees/2.6.1/concurrent-trees-2.6.1.jar
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy/3.0.10/groovy-3.0.10.jar
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-xml/3.0.10/groovy-xml-3.0.10.jar
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/durian/durian-core/1.2.0/durian-core-1.2.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/spotless/spotless-lib-extra/2.30.0/spotless-lib-extra-2.30.0.jar (84 kB at 2.5 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/diffplug/durian/durian-collect/1.2.0/durian-collect-1.2.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/com/googlecode/concurrent-trees/concurrent-trees/2.6.1/concurrent-trees-2.6.1.jar (120 kB at 2.6 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.2.0/plexus-resources-1.2.0.jar
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/durian/durian-core/1.2.0/durian-core-1.2.0.jar (344 kB at 6.4 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.4.1/plexus-utils-3.4.1.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy-xml/3.0.10/groovy-xml-3.0.10.jar (296 kB at 4.5 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/1.7.32/slf4j-api-1.7.32.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-resources/1.2.0/plexus-resources-1.2.0.jar (23 kB at 353 kB/s)
Downloading from central: https://repo.maven.apache.org/maven2/org/eclipse/jgit/org.eclipse.jgit/5.13.1.202206130422-r/org.eclipse.jgit-5.13.1.202206130422-r.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/3.4.1/plexus-utils-3.4.1.jar (264 kB at 2.9 MB/s)
Downloading from central: https://repo.maven.apache.org/maven2/com/googlecode/javaewah/JavaEWAH/1.1.13/JavaEWAH-1.1.13.jar
Downloaded from central: https://repo.maven.apache.org/maven2/org/slf4j/slf4j-api/1.7.32/slf4j-api-1.7.32.jar (42 kB at 446 kB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/com/diffplug/durian/durian-collect/1.2.0/durian-collect-1.2.0.jar (1.2 MB at 11 MB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/com/googlecode/javaewah/JavaEWAH/1.1.13/JavaEWAH-1.1.13.jar (167 kB at 932 kB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/org/eclipse/jgit/org.eclipse.jgit/5.13.1.202206130422-r/org.eclipse.jgit-5.13.1.202206130422-r.jar (2.8 MB at 11 MB/s)
Downloaded from central: https://repo.maven.apache.org/maven2/org/codehaus/groovy/groovy/3.0.10/groovy-3.0.10.jar (8.0 MB at 25 MB/s)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for onetable 0.1.0-SNAPSHOT:
[INFO] 
[INFO] onetable ........................................... FAILURE [  5.698 s]
[INFO] api ................................................ SKIPPED
[INFO] hudi-support ....................................... SKIPPED
[INFO] hudi-utils ......................................... SKIPPED
[INFO] core ............................................... SKIPPED
[INFO] utilities .......................................... SKIPPED
[INFO] hudi-extensions .................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  5.963 s
[INFO] Finished at: 2024-02-29T18:37:58-05:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.27.2:check (default) on project onetable: Execution default of goal com.diffplug.spotless:spotless-maven-plugin:2.27.2:check failed: You are running Spotless on JVM 8, which limits you to google-java-format 1.7.
[ERROR] Upgrade your JVM or try google-java-format alternatives:
[ERROR] - Version 1.7 requires JVM 8+
[ERROR] - Version 1.15.0 requires JVM 11+
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
soumilshah@Soumils-MBP incubator-xtable % 
soumilshah1995 commented 6 months ago

honestly I have never done java work in past so I am not sure much on what's going on to be honest

the-other-tim-brown commented 6 months ago

The error message hints that you are running on java 8. You can use java -version to see which version you are on.

soumilshah1995 commented 6 months ago

This presents a considerable inconvenience for individuals unfamiliar with Java development. Is there a more straightforward method to acquire this JAR file, such as through Maven or another accessible link? I'm encountering difficulties in getting it to function properly.

soumilshah1995 commented 6 months ago

Simplifying the JAR process for users, especially newcomers to Java, would greatly enhance their experience. It would be highly appreciated by both the community and users if the JAR could be made available on a platform like Maven, enabling them to download it directly and use it without hassle. Your consideration of this suggestion is greatly valued.

the-other-tim-brown commented 6 months ago

This one is older but should still work for what you're trying: link

https://github.com/apache/incubator-xtable/packages/1986831 - packages page

soumilshah1995 commented 6 months ago

@the-other-tim-brown followed the steps mentioned above

Here is Spark submit


spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-extensions-0.1.0-beta1.jar \
    /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --op UPSERT \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --sync-tool-classes io.onetable.hudi.sync.OneTableSyncTool \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats=DELTA,ICEBERG' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168'

Job works fine but I do not see Delta Logs files am I missing something here ? image

soumilshah1995 commented 6 months ago

@the-other-tim-brown if I add both the jar which are provided on GH


spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars '/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-extensions-0.1.0-beta1.jar,/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/utilities-0.1.0-beta1-bundled.jar' \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --sync-tool-classes io.onetable.hudi.sync.OneTableSyncTool \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats=DELTA,ICEBERG' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168'

Getting Following Error

24/03/01 08:22:21 INFO Javalin: 
       __                      __ _            __ __
      / /____ _ _   __ ____ _ / /(_)____      / // /
 __  / // __ `/| | / // __ `// // // __ \    / // /_
/ /_/ // /_/ / | |/ // /_/ // // // / / /   /__  __/
\____/ \__,_/  |___/ \__,_//_//_//_/ /_/      /_/

          https://javalin.io/documentation

24/03/01 08:22:21 INFO StreamSync: Shutting down embedded timeline server
24/03/01 08:22:21 INFO SparkContext: SparkContext is stopping with exitCode 0.
24/03/01 08:22:21 INFO SparkUI: Stopped Spark web UI at http://soumils-mbp:8090
24/03/01 08:22:21 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/03/01 08:22:21 INFO MemoryStore: MemoryStore cleared
24/03/01 08:22:21 INFO BlockManager: BlockManager stopped
24/03/01 08:22:21 INFO BlockManagerMaster: BlockManagerMaster stopped
24/03/01 08:22:21 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/03/01 08:22:21 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.logging.slf4j.Log4jLoggerFactory: method 'void <init>()' not found
    at org.slf4j.impl.StaticLoggerBinder.<init>(StaticLoggerBinder.java:53)
    at org.slf4j.impl.StaticLoggerBinder.<clinit>(StaticLoggerBinder.java:41)
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:315)
    at io.javalin.core.util.Util.classExists(Util.kt:32)
    at io.javalin.core.util.Util.loggingLibraryExists(Util.kt:108)
    at io.javalin.core.util.Util.printHelpfulMessageIfLoggerIsMissing(Util.kt:93)
    at io.javalin.Javalin.start(Javalin.java:169)
    at io.javalin.Javalin.start(Javalin.java:148)
    at org.apache.hudi.timeline.service.TimelineService.startServiceOnPort(TimelineService.java:325)
    at org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:362)
    at org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:105)
    at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:71)
    at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:58)
    at org.apache.hudi.utilities.streamer.StreamSync.reInitWriteClient(StreamSync.java:978)
    at org.apache.hudi.utilities.streamer.StreamSync.setupWriteClient(StreamSync.java:961)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:414)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
24/03/01 08:22:26 INFO ShutdownHookManager: Shutdown hook called
24/03/01 08:22:26 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-6d32517c-5034-46e4-8154-c0ac1d21e465
24/03/01 08:22:26 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-5a18d567-a4d0-4679-a7fe-63e456adf57b
soumilshah@Soumils-MBP DeltaStreamer % 

But if I remove utilities-0.1.0-beta1-bundled.jar and keep -> hudi-extensions-0.1.0-beta1.jar Job works fine but I dont see deltaLog Files

the-other-tim-brown commented 6 months ago

The utilities bundle should not be required. You can just use the extensions. Are there any logs from the job that worked? I think you may need to include --enable-sync in the job parameters to trigger the sync

soumilshah1995 commented 6 months ago

fired

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --op UPSERT \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t'

Logs


Ivy Default Cache set to: /Users/soumilshah/.ivy2/cache
The jars for the packages stored in: /Users/soumilshah/.ivy2/jars
org.apache.hudi#hudi-spark3.4-bundle_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-47201a32-9dd2-4961-bdd5-c832d0251530;1.0
    confs: [default]
    found org.apache.hudi#hudi-spark3.4-bundle_2.12;0.14.0 in spark-list
:: resolution report :: resolve 62ms :: artifacts dl 1ms
    :: modules in use:
    org.apache.hudi#hudi-spark3.4-bundle_2.12;0.14.0 from spark-list in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-47201a32-9dd2-4961-bdd5-c832d0251530
    confs: [default]
    0 artifacts copied, 1 already retrieved (0kB/4ms)
24/03/01 11:21:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/03/01 11:21:01 WARN SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
24/03/01 11:21:01 INFO SparkContext: Running Spark version 3.4.0
24/03/01 11:21:01 INFO ResourceUtils: ==============================================================
24/03/01 11:21:01 INFO ResourceUtils: No custom resources configured for spark.driver.
24/03/01 11:21:01 INFO ResourceUtils: ==============================================================
24/03/01 11:21:01 INFO SparkContext: Submitted application: streamer-bronze_orders
24/03/01 11:21:01 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
24/03/01 11:21:01 INFO ResourceProfile: Limiting resource is cpu
24/03/01 11:21:01 INFO ResourceProfileManager: Added ResourceProfile id: 0
24/03/01 11:21:01 INFO SecurityManager: Changing view acls to: soumilshah
24/03/01 11:21:01 INFO SecurityManager: Changing modify acls to: soumilshah
24/03/01 11:21:01 INFO SecurityManager: Changing view acls groups to: 
24/03/01 11:21:01 INFO SecurityManager: Changing modify acls groups to: 
24/03/01 11:21:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: soumilshah; groups with view permissions: EMPTY; users with modify permissions: soumilshah; groups with modify permissions: EMPTY
24/03/01 11:21:01 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
24/03/01 11:21:01 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
24/03/01 11:21:01 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
24/03/01 11:21:01 INFO Utils: Successfully started service 'sparkDriver' on port 49383.
24/03/01 11:21:01 INFO SparkEnv: Registering MapOutputTracker
24/03/01 11:21:01 INFO SparkEnv: Registering BlockManagerMaster
24/03/01 11:21:01 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
24/03/01 11:21:01 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
24/03/01 11:21:01 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
24/03/01 11:21:01 INFO DiskBlockManager: Created local directory at /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/blockmgr-34057d78-1e52-41ee-af66-4d2c413cd8f9
24/03/01 11:21:01 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
24/03/01 11:21:01 INFO SparkEnv: Registering OutputCommitCoordinator
24/03/01 11:21:01 INFO JettyUtils: Start Jetty 0.0.0.0:8090 for SparkUI
24/03/01 11:21:01 INFO Utils: Successfully started service 'SparkUI' on port 8090.
24/03/01 11:21:01 INFO SparkContext: Added JAR file:///Users/soumilshah/.ivy2/jars/org.apache.hudi_hudi-spark3.4-bundle_2.12-0.14.0.jar at spark://soumils-mbp:49383/jars/org.apache.hudi_hudi-spark3.4-bundle_2.12-0.14.0.jar with timestamp 1709310061534
24/03/01 11:21:01 INFO SparkContext: Added JAR file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar at spark://soumils-mbp:49383/jars/hudi-utilities-slim-bundle_2.12-0.14.0.jar with timestamp 1709310061534
24/03/01 11:21:01 INFO Executor: Starting executor ID driver on host soumils-mbp
24/03/01 11:21:01 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
24/03/01 11:21:01 INFO Executor: Fetching spark://soumils-mbp:49383/jars/hudi-utilities-slim-bundle_2.12-0.14.0.jar with timestamp 1709310061534
24/03/01 11:21:01 INFO TransportClientFactory: Successfully created connection to soumils-mbp/192.168.1.31:49383 after 13 ms (0 ms spent in bootstraps)
24/03/01 11:21:01 INFO Utils: Fetching spark://soumils-mbp:49383/jars/hudi-utilities-slim-bundle_2.12-0.14.0.jar to /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-d6be7d0c-a2d9-4bbf-9a80-1e2ad7e990b7/userFiles-8ce7b2c0-2042-41a8-8ce9-07363dfe496a/fetchFileTemp10282544856653676230.tmp
24/03/01 11:21:02 INFO Executor: Adding file:/private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-d6be7d0c-a2d9-4bbf-9a80-1e2ad7e990b7/userFiles-8ce7b2c0-2042-41a8-8ce9-07363dfe496a/hudi-utilities-slim-bundle_2.12-0.14.0.jar to class loader
24/03/01 11:21:02 INFO Executor: Fetching spark://soumils-mbp:49383/jars/org.apache.hudi_hudi-spark3.4-bundle_2.12-0.14.0.jar with timestamp 1709310061534
24/03/01 11:21:02 INFO Utils: Fetching spark://soumils-mbp:49383/jars/org.apache.hudi_hudi-spark3.4-bundle_2.12-0.14.0.jar to /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-d6be7d0c-a2d9-4bbf-9a80-1e2ad7e990b7/userFiles-8ce7b2c0-2042-41a8-8ce9-07363dfe496a/fetchFileTemp9476611464365469489.tmp
24/03/01 11:21:02 INFO Executor: Adding file:/private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-d6be7d0c-a2d9-4bbf-9a80-1e2ad7e990b7/userFiles-8ce7b2c0-2042-41a8-8ce9-07363dfe496a/org.apache.hudi_hudi-spark3.4-bundle_2.12-0.14.0.jar to class loader
24/03/01 11:21:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49385.
24/03/01 11:21:02 INFO NettyBlockTransferService: Server created on soumils-mbp:49385
24/03/01 11:21:02 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
24/03/01 11:21:02 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, soumils-mbp, 49385, None)
24/03/01 11:21:02 INFO BlockManagerMasterEndpoint: Registering block manager soumils-mbp:49385 with 434.4 MiB RAM, BlockManagerId(driver, soumils-mbp, 49385, None)
24/03/01 11:21:02 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, soumils-mbp, 49385, None)
24/03/01 11:21:02 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, soumils-mbp, 49385, None)
24/03/01 11:21:02 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
24/03/01 11:21:02 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
24/03/01 11:21:02 INFO UtilHelpers: Adding overridden properties to file properties.
24/03/01 11:21:02 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
24/03/01 11:21:02 INFO SharedState: Warehouse path is 'file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/spark-warehouse'.
24/03/01 11:21:02 INFO HoodieStreamer: Creating Hudi Streamer with configs:
hoodie.auto.adjust.lock.configs: true
hoodie.datasource.write.partitionpath.field: order_date
hoodie.datasource.write.precombine.field: ts
hoodie.datasource.write.reconcile.schema: false
hoodie.datasource.write.recordkey.field: order_id
hoodie.deltastreamer.csv.header: true
hoodie.deltastreamer.csv.sep: \t
hoodie.streamer.source.dfs.root: file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders

24/03/01 11:21:02 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:02 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:02 INFO HoodieTableMetaClient: Initializing file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders as hoodie table file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:02 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:02 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:02 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:02 INFO HoodieTableMetaClient: Finished initializing Table of type COPY_ON_WRITE from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:02 INFO DFSPathSelector: Using path selector org.apache.hudi.utilities.sources.helpers.DFSPathSelector
24/03/01 11:21:02 INFO HoodieIngestionService: Ingestion service starts running in run-once mode
24/03/01 11:21:02 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:02 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:02 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:02 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
24/03/01 11:21:02 INFO StreamSync: Checkpoint to resume from : Optional.empty
24/03/01 11:21:02 INFO DFSPathSelector: Root path => file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders source limit => 4000000
24/03/01 11:21:03 INFO InMemoryFileIndex: It took 20 ms to list leaf files for 8 paths.
24/03/01 11:21:03 INFO InMemoryFileIndex: It took 3 ms to list leaf files for 8 paths.
24/03/01 11:21:04 INFO FileSourceStrategy: Pushed Filters: 
24/03/01 11:21:04 INFO FileSourceStrategy: Post-Scan Filters: (length(trim(value#0, None)) > 0)
24/03/01 11:21:04 INFO CodeGenerator: Code generated in 86.516042 ms
24/03/01 11:21:04 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 202.6 KiB, free 434.2 MiB)
24/03/01 11:21:04 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 35.4 KiB, free 434.2 MiB)
24/03/01 11:21:04 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on soumils-mbp:49385 (size: 35.4 KiB, free: 434.4 MiB)
24/03/01 11:21:04 INFO SparkContext: Created broadcast 0 from load at CsvDFSSource.java:128
24/03/01 11:21:04 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
24/03/01 11:21:04 INFO SparkContext: Starting job: load at CsvDFSSource.java:128
24/03/01 11:21:04 INFO DAGScheduler: Got job 0 (load at CsvDFSSource.java:128) with 1 output partitions
24/03/01 11:21:04 INFO DAGScheduler: Final stage: ResultStage 0 (load at CsvDFSSource.java:128)
24/03/01 11:21:04 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:04 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:04 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at load at CsvDFSSource.java:128), which has no missing parents
24/03/01 11:21:04 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 12.1 KiB, free 434.2 MiB)
24/03/01 11:21:04 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 6.0 KiB, free 434.1 MiB)
24/03/01 11:21:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on soumils-mbp:49385 (size: 6.0 KiB, free: 434.4 MiB)
24/03/01 11:21:04 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:04 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at load at CsvDFSSource.java:128) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:04 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
24/03/01 11:21:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:04 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
24/03/01 11:21:04 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/d75b3ff2-aee7-4cd5-9da1-93d05b85b86b_orders.csv, range: 0-13248, partition values: [empty row]
24/03/01 11:21:05 INFO CodeGenerator: Code generated in 14.443792 ms
24/03/01 11:21:05 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1467 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 113 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:05 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
24/03/01 11:21:05 INFO DAGScheduler: ResultStage 0 (load at CsvDFSSource.java:128) finished in 0.168 s
24/03/01 11:21:05 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
24/03/01 11:21:05 INFO DAGScheduler: Job 0 finished: load at CsvDFSSource.java:128, took 0.186067 s
24/03/01 11:21:05 INFO CodeGenerator: Code generated in 3.987083 ms
24/03/01 11:21:05 INFO FileSourceStrategy: Pushed Filters: 
24/03/01 11:21:05 INFO FileSourceStrategy: Post-Scan Filters: 
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 202.6 KiB, free 434.0 MiB)
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 35.4 KiB, free 433.9 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on soumils-mbp:49385 (size: 35.4 KiB, free: 434.3 MiB)
24/03/01 11:21:05 INFO SparkContext: Created broadcast 2 from load at CsvDFSSource.java:128
24/03/01 11:21:05 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
24/03/01 11:21:05 INFO SparkContext: Starting job: load at CsvDFSSource.java:128
24/03/01 11:21:05 INFO DAGScheduler: Got job 1 (load at CsvDFSSource.java:128) with 8 output partitions
24/03/01 11:21:05 INFO DAGScheduler: Final stage: ResultStage 1 (load at CsvDFSSource.java:128)
24/03/01 11:21:05 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:05 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:05 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[9] at load at CsvDFSSource.java:128), which has no missing parents
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 27.2 KiB, free 433.9 MiB)
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 12.7 KiB, free 433.9 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on soumils-mbp:49385 (size: 12.7 KiB, free: 434.3 MiB)
24/03/01 11:21:05 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:05 INFO DAGScheduler: Submitting 8 missing tasks from ResultStage 1 (MapPartitionsRDD[9] at load at CsvDFSSource.java:128) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
24/03/01 11:21:05 INFO TaskSchedulerImpl: Adding task set 1.0 with 8 tasks resource profile 0
24/03/01 11:21:05 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2) (soumils-mbp, executor driver, partition 1, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 3) (soumils-mbp, executor driver, partition 2, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 4) (soumils-mbp, executor driver, partition 3, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 5) (soumils-mbp, executor driver, partition 4, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 6) (soumils-mbp, executor driver, partition 5, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 7) (soumils-mbp, executor driver, partition 6, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 8) (soumils-mbp, executor driver, partition 7, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
24/03/01 11:21:05 INFO Executor: Running task 2.0 in stage 1.0 (TID 3)
24/03/01 11:21:05 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
24/03/01 11:21:05 INFO Executor: Running task 3.0 in stage 1.0 (TID 4)
24/03/01 11:21:05 INFO Executor: Running task 5.0 in stage 1.0 (TID 6)
24/03/01 11:21:05 INFO Executor: Running task 4.0 in stage 1.0 (TID 5)
24/03/01 11:21:05 INFO Executor: Running task 6.0 in stage 1.0 (TID 7)
24/03/01 11:21:05 INFO Executor: Running task 7.0 in stage 1.0 (TID 8)
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/ea34b272-7f0f-4d95-aa7c-34f396bf146a_orders.csv, range: 0-13127, partition values: [empty row]
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/bd14f873-5e9e-41b4-a767-830823f49e97_orders.csv, range: 0-13076, partition values: [empty row]
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/d75b3ff2-aee7-4cd5-9da1-93d05b85b86b_orders.csv, range: 0-13248, partition values: [empty row]
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/77a2bb4b-494b-4f69-a277-37bc49be8b29_orders.csv, range: 0-13169, partition values: [empty row]
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/7fc6265c-04b1-4b4c-8ddb-62cc4455c028_orders.csv, range: 0-13171, partition values: [empty row]
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/ad3608d1-28f6-418b-85d8-08272ec3a9b4_orders.csv, range: 0-13199, partition values: [empty row]
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/42042af4-b4e3-49e0-8441-15f82f94fa50_orders.csv, range: 0-13188, partition values: [empty row]
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/aac46b74-8dbc-4ce4-8cc8-5925717c8018_orders.csv, range: 0-13135, partition values: [empty row]
24/03/01 11:21:05 INFO Executor: Finished task 3.0 in stage 1.0 (TID 4). 1457 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 4) in 64 ms on soumils-mbp (executor driver) (1/8)
24/03/01 11:21:05 INFO Executor: Finished task 7.0 in stage 1.0 (TID 8). 1500 bytes result sent to driver
24/03/01 11:21:05 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 1500 bytes result sent to driver
24/03/01 11:21:05 INFO Executor: Finished task 2.0 in stage 1.0 (TID 3). 1500 bytes result sent to driver
24/03/01 11:21:05 INFO Executor: Finished task 4.0 in stage 1.0 (TID 5). 1500 bytes result sent to driver
24/03/01 11:21:05 INFO Executor: Finished task 5.0 in stage 1.0 (TID 6). 1500 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 8) in 76 ms on soumils-mbp (executor driver) (2/8)
24/03/01 11:21:05 INFO Executor: Finished task 6.0 in stage 1.0 (TID 7). 1500 bytes result sent to driver
24/03/01 11:21:05 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1500 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 5) in 76 ms on soumils-mbp (executor driver) (3/8)
24/03/01 11:21:05 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 76 ms on soumils-mbp (executor driver) (4/8)
24/03/01 11:21:05 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 3) in 77 ms on soumils-mbp (executor driver) (5/8)
24/03/01 11:21:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 79 ms on soumils-mbp (executor driver) (6/8)
24/03/01 11:21:05 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 6) in 78 ms on soumils-mbp (executor driver) (7/8)
24/03/01 11:21:05 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 7) in 79 ms on soumils-mbp (executor driver) (8/8)
24/03/01 11:21:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
24/03/01 11:21:05 INFO DAGScheduler: ResultStage 1 (load at CsvDFSSource.java:128) finished in 0.105 s
24/03/01 11:21:05 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
24/03/01 11:21:05 INFO DAGScheduler: Job 1 finished: load at CsvDFSSource.java:128, took 0.106382 s
24/03/01 11:21:05 INFO FileSourceStrategy: Pushed Filters: 
24/03/01 11:21:05 INFO FileSourceStrategy: Post-Scan Filters: 
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 202.5 KiB, free 433.7 MiB)
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 35.4 KiB, free 433.6 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on soumils-mbp:49385 (size: 35.4 KiB, free: 434.3 MiB)
24/03/01 11:21:05 INFO SparkContext: Created broadcast 4 from toRdd at HoodieSparkUtils.scala:107
24/03/01 11:21:05 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
24/03/01 11:21:05 INFO SparkContext: Starting job: isEmpty at StreamSync.java:602
24/03/01 11:21:05 INFO DAGScheduler: Got job 2 (isEmpty at StreamSync.java:602) with 1 output partitions
24/03/01 11:21:05 INFO DAGScheduler: Final stage: ResultStage 2 (isEmpty at StreamSync.java:602)
24/03/01 11:21:05 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:05 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:05 INFO DAGScheduler: Submitting ResultStage 2 (SQLConfInjectingRDD[14] at SQLConfInjectingRDD at HoodieSparkUtils.scala:129), which has no missing parents
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 16.8 KiB, free 433.6 MiB)
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 8.1 KiB, free 433.6 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on soumils-mbp:49385 (size: 8.1 KiB, free: 434.3 MiB)
24/03/01 11:21:05 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:05 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (SQLConfInjectingRDD[14] at SQLConfInjectingRDD at HoodieSparkUtils.scala:129) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:05 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks resource profile 0
24/03/01 11:21:05 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 9) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO Executor: Running task 0.0 in stage 2.0 (TID 9)
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/d75b3ff2-aee7-4cd5-9da1-93d05b85b86b_orders.csv, range: 0-13248, partition values: [empty row]
24/03/01 11:21:05 INFO CodeGenerator: Code generated in 7.129583 ms
24/03/01 11:21:05 INFO Executor: Finished task 0.0 in stage 2.0 (TID 9). 1634 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 9) in 81 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:05 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
24/03/01 11:21:05 INFO DAGScheduler: ResultStage 2 (isEmpty at StreamSync.java:602) finished in 0.084 s
24/03/01 11:21:05 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished
24/03/01 11:21:05 INFO DAGScheduler: Job 2 finished: isEmpty at StreamSync.java:602, took 0.085798 s
24/03/01 11:21:05 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:05 INFO StreamSync: Setting up new Hoodie Write Client
24/03/01 11:21:05 INFO EmbeddedTimelineService: Starting Timeline service !!
24/03/01 11:21:05 INFO EmbeddedTimelineService: Overriding hostIp to (soumils-mbp) found in spark-conf. It was null
24/03/01 11:21:05 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:05 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:05 INFO BlockManagerInfo: Removed broadcast_3_piece0 on soumils-mbp:49385 in memory (size: 12.7 KiB, free: 434.3 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Removed broadcast_0_piece0 on soumils-mbp:49385 in memory (size: 35.4 KiB, free: 434.3 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Removed broadcast_2_piece0 on soumils-mbp:49385 in memory (size: 35.4 KiB, free: 434.4 MiB)
24/03/01 11:21:05 INFO log: Logging initialized @4935ms to org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
24/03/01 11:21:05 INFO BlockManagerInfo: Removed broadcast_1_piece0 on soumils-mbp:49385 in memory (size: 6.0 KiB, free: 434.4 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Removed broadcast_5_piece0 on soumils-mbp:49385 in memory (size: 8.1 KiB, free: 434.4 MiB)
24/03/01 11:21:05 INFO Javalin: 
       __                      __ _            __ __
      / /____ _ _   __ ____ _ / /(_)____      / // /
 __  / // __ `/| | / // __ `// // // __ \    / // /_
/ /_/ // /_/ / | |/ // /_/ // // // / / /   /__  __/
\____/ \__,_/  |___/ \__,_//_//_//_/ /_/      /_/

          https://javalin.io/documentation

24/03/01 11:21:05 INFO Javalin: Starting Javalin ...
24/03/01 11:21:05 INFO Javalin: You are running Javalin 4.6.7 (released October 24, 2022. Your Javalin version is 494 days old. Consider checking for a newer version.).
24/03/01 11:21:05 INFO Server: jetty-9.4.48.v20220622; built: 2022-06-21T20:42:25.880Z; git: 6b67c5719d1f4371b33655ff2d047d24e171e49a; jvm 11.0.22+0
24/03/01 11:21:05 INFO Server: Started @5056ms
24/03/01 11:21:05 INFO Javalin: Listening on http://localhost:49387/
24/03/01 11:21:05 INFO Javalin: Javalin started in 62ms \o/
24/03/01 11:21:05 INFO TimelineService: Starting Timeline server on port :49387
24/03/01 11:21:05 INFO EmbeddedTimelineService: Started embedded timeline server at soumils-mbp:49387
24/03/01 11:21:05 INFO BaseHoodieClient: Timeline Server already running. Not restarting the service
24/03/01 11:21:05 INFO BaseHoodieClient: Timeline Server already running. Not restarting the service
24/03/01 11:21:05 INFO SparkContext: Starting job: isEmpty at StreamSync.java:767
24/03/01 11:21:05 INFO DAGScheduler: Got job 3 (isEmpty at StreamSync.java:767) with 1 output partitions
24/03/01 11:21:05 INFO DAGScheduler: Final stage: ResultStage 3 (isEmpty at StreamSync.java:767)
24/03/01 11:21:05 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:05 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:05 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[15] at mapPartitions at StreamSync.java:615), which has no missing parents
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 21.8 KiB, free 434.1 MiB)
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 10.4 KiB, free 434.1 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on soumils-mbp:49385 (size: 10.4 KiB, free: 434.4 MiB)
24/03/01 11:21:05 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:05 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[15] at mapPartitions at StreamSync.java:615) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:05 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks resource profile 0
24/03/01 11:21:05 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 10) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 8010 bytes) 
24/03/01 11:21:05 INFO Executor: Running task 0.0 in stage 3.0 (TID 10)
24/03/01 11:21:05 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/d75b3ff2-aee7-4cd5-9da1-93d05b85b86b_orders.csv, range: 0-13248, partition values: [empty row]
24/03/01 11:21:05 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:05 INFO Executor: Finished task 0.0 in stage 3.0 (TID 10). 1346 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 10) in 36 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:05 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
24/03/01 11:21:05 INFO DAGScheduler: ResultStage 3 (isEmpty at StreamSync.java:767) finished in 0.045 s
24/03/01 11:21:05 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished
24/03/01 11:21:05 INFO DAGScheduler: Job 3 finished: isEmpty at StreamSync.java:767, took 0.046626 s
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
24/03/01 11:21:05 INFO CleanerUtils: Cleaned failed attempts if any
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
24/03/01 11:21:05 INFO FileSystemViewManager: Creating remote first table view
24/03/01 11:21:05 INFO BaseHoodieWriteClient: Generate a new instant time: 20240301112102951 action: commit
24/03/01 11:21:05 INFO HoodieActiveTimeline: Creating a new instant [==>20240301112102951__commit__REQUESTED]
24/03/01 11:21:05 INFO StreamSync: Starting commit  : 20240301112102951
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20240301112102951__commit__REQUESTED__20240301112105636]}
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:05 INFO HoodieBackedTableMetadataWriter: Async metadata indexing disabled and following partitions already initialized: []
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20240301112102951__commit__REQUESTED__20240301112105636]}
24/03/01 11:21:05 INFO HoodieTableMetaClient: Initializing file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata as hoodie table file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished initializing Table of type MERGE_ON_READ from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:116
24/03/01 11:21:05 INFO DAGScheduler: Got job 4 (collect at HoodieSparkEngineContext.java:116) with 1 output partitions
24/03/01 11:21:05 INFO DAGScheduler: Final stage: ResultStage 4 (collect at HoodieSparkEngineContext.java:116)
24/03/01 11:21:05 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:05 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:05 INFO DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[17] at map at HoodieSparkEngineContext.java:116), which has no missing parents
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 101.2 KiB, free 434.0 MiB)
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 36.2 KiB, free 434.0 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on soumils-mbp:49385 (size: 36.2 KiB, free: 434.3 MiB)
24/03/01 11:21:05 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:05 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[17] at map at HoodieSparkEngineContext.java:116) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:05 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks resource profile 0
24/03/01 11:21:05 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 11) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7410 bytes) 
24/03/01 11:21:05 INFO Executor: Running task 0.0 in stage 4.0 (TID 11)
24/03/01 11:21:05 INFO Executor: Finished task 0.0 in stage 4.0 (TID 11). 934 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 11) in 18 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:05 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
24/03/01 11:21:05 INFO DAGScheduler: ResultStage 4 (collect at HoodieSparkEngineContext.java:116) finished in 0.028 s
24/03/01 11:21:05 INFO DAGScheduler: Job 4 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 4: Stage finished
24/03/01 11:21:05 INFO DAGScheduler: Job 4 finished: collect at HoodieSparkEngineContext.java:116, took 0.030746 s
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
24/03/01 11:21:05 INFO HoodieBackedTableMetadataWriter: Initializing MDT partition FILES at instant 00000000000000010
24/03/01 11:21:05 INFO HoodieBackedTableMetadataWriter: Committing total 0 partitions and 0 files to metadata
24/03/01 11:21:05 INFO SparkContext: Starting job: count at HoodieJavaRDD.java:115
24/03/01 11:21:05 INFO DAGScheduler: Got job 5 (count at HoodieJavaRDD.java:115) with 1 output partitions
24/03/01 11:21:05 INFO DAGScheduler: Final stage: ResultStage 5 (count at HoodieJavaRDD.java:115)
24/03/01 11:21:05 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:05 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:05 INFO DAGScheduler: Submitting ResultStage 5 (ParallelCollectionRDD[18] at parallelize at HoodieSparkEngineContext.java:111), which has no missing parents
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 3.0 KiB, free 434.0 MiB)
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 1866.0 B, free 434.0 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on soumils-mbp:49385 (size: 1866.0 B, free: 434.3 MiB)
24/03/01 11:21:05 INFO SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:05 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 5 (ParallelCollectionRDD[18] at parallelize at HoodieSparkEngineContext.java:111) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:05 INFO TaskSchedulerImpl: Adding task set 5.0 with 1 tasks resource profile 0
24/03/01 11:21:05 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 12) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7324 bytes) 
24/03/01 11:21:05 INFO Executor: Running task 0.0 in stage 5.0 (TID 12)
24/03/01 11:21:05 INFO Executor: Finished task 0.0 in stage 5.0 (TID 12). 771 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 12) in 3 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:05 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 
24/03/01 11:21:05 INFO DAGScheduler: ResultStage 5 (count at HoodieJavaRDD.java:115) finished in 0.007 s
24/03/01 11:21:05 INFO DAGScheduler: Job 5 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 5: Stage finished
24/03/01 11:21:05 INFO DAGScheduler: Job 5 finished: count at HoodieJavaRDD.java:115, took 0.008649 s
24/03/01 11:21:05 INFO HoodieBackedTableMetadataWriter: Initializing FILES index with 1 mappings and 1 file groups.
24/03/01 11:21:05 INFO HoodieBackedTableMetadataWriter: Creating 1 file groups for partition files with base fileId files- at instant time 00000000000000010
24/03/01 11:21:05 INFO SparkContext: Starting job: foreach at HoodieSparkEngineContext.java:155
24/03/01 11:21:05 INFO DAGScheduler: Got job 6 (foreach at HoodieSparkEngineContext.java:155) with 1 output partitions
24/03/01 11:21:05 INFO DAGScheduler: Final stage: ResultStage 6 (foreach at HoodieSparkEngineContext.java:155)
24/03/01 11:21:05 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:05 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:05 INFO DAGScheduler: Submitting ResultStage 6 (ParallelCollectionRDD[19] at parallelize at HoodieSparkEngineContext.java:155), which has no missing parents
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 380.0 KiB, free 433.6 MiB)
24/03/01 11:21:05 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 126.6 KiB, free 433.5 MiB)
24/03/01 11:21:05 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on soumils-mbp:49385 (size: 126.6 KiB, free: 434.2 MiB)
24/03/01 11:21:05 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:05 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 6 (ParallelCollectionRDD[19] at parallelize at HoodieSparkEngineContext.java:155) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:05 INFO TaskSchedulerImpl: Adding task set 6.0 with 1 tasks resource profile 0
24/03/01 11:21:05 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 13) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7256 bytes) 
24/03/01 11:21:05 INFO Executor: Running task 0.0 in stage 6.0 (TID 13)
24/03/01 11:21:05 INFO HoodieLogFormat$WriterBuilder: Building HoodieLogFormat Writer
24/03/01 11:21:05 INFO HoodieLogFormat$WriterBuilder: HoodieLogFile on path file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0
24/03/01 11:21:05 INFO HoodieLogFormatWriter: HoodieLogFile{pathStr='file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0', fileLen=0} does not exist. Create a new file
24/03/01 11:21:05 INFO Executor: Finished task 0.0 in stage 6.0 (TID 13). 808 bytes result sent to driver
24/03/01 11:21:05 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 13) in 34 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:05 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 
24/03/01 11:21:05 INFO DAGScheduler: ResultStage 6 (foreach at HoodieSparkEngineContext.java:155) finished in 0.053 s
24/03/01 11:21:05 INFO DAGScheduler: Job 6 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:05 INFO TaskSchedulerImpl: Killing all running tasks in stage 6: Stage finished
24/03/01 11:21:05 INFO DAGScheduler: Job 6 finished: foreach at HoodieSparkEngineContext.java:155, took 0.054310 s
24/03/01 11:21:05 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
24/03/01 11:21:05 INFO ClusteringUtils: Found 0 files in pending clustering operations
24/03/01 11:21:05 INFO HoodieTableMetadataUtil: Loading latest file slices for metadata table partition files
24/03/01 11:21:05 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
24/03/01 11:21:05 INFO ClusteringUtils: Found 0 files in pending clustering operations
24/03/01 11:21:05 INFO AbstractTableFileSystemView: Building file system view for partition (files)
24/03/01 11:21:05 INFO BaseHoodieClient: Embedded Timeline Server is disabled. Not starting timeline service
24/03/01 11:21:05 INFO BaseHoodieClient: Embedded Timeline Server is disabled. Not starting timeline service
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:05 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:05 INFO HoodieBackedTableMetadataWriter: New commit at 00000000000000010 being applied to MDT.
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
24/03/01 11:21:05 INFO CleanerUtils: Cleaned failed attempts if any
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:05 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:05 INFO BaseHoodieWriteClient: Generate a new instant time: 00000000000000010 action: deltacommit
24/03/01 11:21:05 INFO HoodieActiveTimeline: Creating a new instant [==>00000000000000010__deltacommit__REQUESTED]
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:05 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>00000000000000010__deltacommit__REQUESTED__20240301112105987]}
24/03/01 11:21:05 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:05 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:05 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO AsyncCleanerService: The HoodieWriteClient is not configured to auto & async clean. Async clean service will not start.
24/03/01 11:21:06 INFO AsyncArchiveService: The HoodieWriteClient is not configured to auto & async archive. Async archive service will not start.
24/03/01 11:21:06 INFO HoodieActiveTimeline: Checking for file exists ?file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/00000000000000010.deltacommit.requested
24/03/01 11:21:06 INFO HoodieActiveTimeline: Create new file for toInstant ?file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/00000000000000010.deltacommit.inflight
24/03/01 11:21:06 INFO SparkContext: Starting job: collect at SparkHoodieMetadataBulkInsertPartitioner.java:95
24/03/01 11:21:06 INFO DAGScheduler: Registering RDD 23 (keyBy at SparkHoodieMetadataBulkInsertPartitioner.java:74) as input to shuffle 0
24/03/01 11:21:06 INFO DAGScheduler: Got job 7 (collect at SparkHoodieMetadataBulkInsertPartitioner.java:95) with 1 output partitions
24/03/01 11:21:06 INFO DAGScheduler: Final stage: ResultStage 8 (collect at SparkHoodieMetadataBulkInsertPartitioner.java:95)
24/03/01 11:21:06 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 7)
24/03/01 11:21:06 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 7)
24/03/01 11:21:06 INFO DAGScheduler: Submitting ShuffleMapStage 7 (MapPartitionsRDD[23] at keyBy at SparkHoodieMetadataBulkInsertPartitioner.java:74), which has no missing parents
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 8.6 KiB, free 433.5 MiB)
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 4.8 KiB, free 433.5 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on soumils-mbp:49385 (size: 4.8 KiB, free: 434.2 MiB)
24/03/01 11:21:06 INFO SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:06 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 7 (MapPartitionsRDD[23] at keyBy at SparkHoodieMetadataBulkInsertPartitioner.java:74) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:06 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks resource profile 0
24/03/01 11:21:06 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 14) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7422 bytes) 
24/03/01 11:21:06 INFO Executor: Running task 0.0 in stage 7.0 (TID 14)
24/03/01 11:21:06 INFO Executor: Finished task 0.0 in stage 7.0 (TID 14). 975 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 14) in 21 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:06 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 
24/03/01 11:21:06 INFO DAGScheduler: ShuffleMapStage 7 (keyBy at SparkHoodieMetadataBulkInsertPartitioner.java:74) finished in 0.036 s
24/03/01 11:21:06 INFO DAGScheduler: looking for newly runnable stages
24/03/01 11:21:06 INFO DAGScheduler: running: Set()
24/03/01 11:21:06 INFO DAGScheduler: waiting: Set(ResultStage 8)
24/03/01 11:21:06 INFO DAGScheduler: failed: Set()
24/03/01 11:21:06 INFO DAGScheduler: Submitting ResultStage 8 (MapPartitionsRDD[26] at mapPartitions at SparkHoodieMetadataBulkInsertPartitioner.java:81), which has no missing parents
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 7.4 KiB, free 433.5 MiB)
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 3.9 KiB, free 433.5 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on soumils-mbp:49385 (size: 3.9 KiB, free: 434.2 MiB)
24/03/01 11:21:06 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:06 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 8 (MapPartitionsRDD[26] at mapPartitions at SparkHoodieMetadataBulkInsertPartitioner.java:81) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:06 INFO TaskSchedulerImpl: Adding task set 8.0 with 1 tasks resource profile 0
24/03/01 11:21:06 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 15) (soumils-mbp, executor driver, partition 0, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:06 INFO Executor: Running task 0.0 in stage 8.0 (TID 15)
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 1 (142.0 B) non-empty blocks including 1 (142.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 7 ms
24/03/01 11:21:06 INFO Executor: Finished task 0.0 in stage 8.0 (TID 15). 1681 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 0.0 in stage 8.0 (TID 15) in 37 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:06 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool 
24/03/01 11:21:06 INFO DAGScheduler: ResultStage 8 (collect at SparkHoodieMetadataBulkInsertPartitioner.java:95) finished in 0.042 s
24/03/01 11:21:06 INFO DAGScheduler: Job 7 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:06 INFO TaskSchedulerImpl: Killing all running tasks in stage 8: Stage finished
24/03/01 11:21:06 INFO DAGScheduler: Job 7 finished: collect at SparkHoodieMetadataBulkInsertPartitioner.java:95, took 0.107504 s
24/03/01 11:21:06 INFO BaseSparkCommitActionExecutor: no validators configured.
24/03/01 11:21:06 INFO BaseCommitActionExecutor: Auto commit enabled: Committing 00000000000000010
24/03/01 11:21:06 INFO SparkContext: Starting job: collect at HoodieJavaRDD.java:177
24/03/01 11:21:06 INFO DAGScheduler: Got job 8 (collect at HoodieJavaRDD.java:177) with 1 output partitions
24/03/01 11:21:06 INFO DAGScheduler: Final stage: ResultStage 10 (collect at HoodieJavaRDD.java:177)
24/03/01 11:21:06 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 9)
24/03/01 11:21:06 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:06 INFO DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[29] at map at HoodieJavaRDD.java:125), which has no missing parents
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 358.0 KiB, free 433.1 MiB)
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 123.8 KiB, free 433.0 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on soumils-mbp:49385 (size: 123.8 KiB, free: 434.1 MiB)
24/03/01 11:21:06 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:06 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 10 (MapPartitionsRDD[29] at map at HoodieJavaRDD.java:125) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:06 INFO TaskSchedulerImpl: Adding task set 10.0 with 1 tasks resource profile 0
24/03/01 11:21:06 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID 16) (soumils-mbp, executor driver, partition 0, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:06 INFO Executor: Running task 0.0 in stage 10.0 (TID 16)
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 1 (142.0 B) non-empty blocks including 1 (142.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:06 INFO SimpleExecutor: Starting consumer, consuming records from the records iterator directly
24/03/01 11:21:06 INFO DirectWriteMarkers: Creating Marker Path=file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/.temp/00000000000000010/files/files-0000-0_0-10-16_00000000000000010.hfile.marker.CREATE
24/03/01 11:21:06 INFO DirectWriteMarkers: [direct] Created marker file file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/.temp/00000000000000010/files/files-0000-0_0-10-16_00000000000000010.hfile.marker.CREATE in 19 ms
24/03/01 11:21:06 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
24/03/01 11:21:06 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
24/03/01 11:21:06 INFO MetricsSystemImpl: HBase metrics system started
24/03/01 11:21:06 INFO MetricRegistries: Loaded MetricRegistries class org.apache.hudi.org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
24/03/01 11:21:06 INFO CodecPool: Got brand-new compressor [.gz]
24/03/01 11:21:06 INFO CodecPool: Got brand-new compressor [.gz]
24/03/01 11:21:06 INFO HoodieCreateHandle: New CreateHandle for partition :files with fileId files-0000-0
24/03/01 11:21:06 INFO HoodieCreateHandle: Closing the file files-0000-0 as we are done with all the records 1
24/03/01 11:21:06 INFO BlockManagerInfo: Removed broadcast_7_piece0 on soumils-mbp:49385 in memory (size: 36.2 KiB, free: 434.1 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Removed broadcast_6_piece0 on soumils-mbp:49385 in memory (size: 10.4 KiB, free: 434.1 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Removed broadcast_8_piece0 on soumils-mbp:49385 in memory (size: 1866.0 B, free: 434.1 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Removed broadcast_9_piece0 on soumils-mbp:49385 in memory (size: 126.6 KiB, free: 434.2 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Removed broadcast_10_piece0 on soumils-mbp:49385 in memory (size: 4.8 KiB, free: 434.2 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Removed broadcast_11_piece0 on soumils-mbp:49385 in memory (size: 3.9 KiB, free: 434.2 MiB)
24/03/01 11:21:06 INFO HoodieCreateHandle: CreateHandle for partitionPath files fileID files-0000-0, took 269 ms.
24/03/01 11:21:06 INFO MemoryStore: Block rdd_28_0 stored as values in memory (estimated size 454.0 B, free 433.7 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Added rdd_28_0 in memory on soumils-mbp:49385 (size: 454.0 B, free: 434.2 MiB)
24/03/01 11:21:06 INFO Executor: Finished task 0.0 in stage 10.0 (TID 16). 2037 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 0.0 in stage 10.0 (TID 16) in 294 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:06 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool 
24/03/01 11:21:06 INFO DAGScheduler: ResultStage 10 (collect at HoodieJavaRDD.java:177) finished in 0.327 s
24/03/01 11:21:06 INFO DAGScheduler: Job 8 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:06 INFO TaskSchedulerImpl: Killing all running tasks in stage 10: Stage finished
24/03/01 11:21:06 INFO DAGScheduler: Job 8 finished: collect at HoodieJavaRDD.java:177, took 0.329325 s
24/03/01 11:21:06 INFO CommitUtils: Creating  metadata for BULK_INSERT numWriteStats:1 numReplaceFileIds:0
24/03/01 11:21:06 INFO BaseSparkCommitActionExecutor: Committing 00000000000000010, action Type deltacommit, operation Type BULK_INSERT
24/03/01 11:21:06 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:150
24/03/01 11:21:06 INFO DAGScheduler: Got job 9 (collect at HoodieSparkEngineContext.java:150) with 1 output partitions
24/03/01 11:21:06 INFO DAGScheduler: Final stage: ResultStage 11 (collect at HoodieSparkEngineContext.java:150)
24/03/01 11:21:06 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:06 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:06 INFO DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[31] at flatMap at HoodieSparkEngineContext.java:150), which has no missing parents
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 100.6 KiB, free 433.6 MiB)
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 35.8 KiB, free 433.6 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory on soumils-mbp:49385 (size: 35.8 KiB, free: 434.2 MiB)
24/03/01 11:21:06 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:06 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[31] at flatMap at HoodieSparkEngineContext.java:150) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:06 INFO TaskSchedulerImpl: Adding task set 11.0 with 1 tasks resource profile 0
24/03/01 11:21:06 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 17) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7382 bytes) 
24/03/01 11:21:06 INFO Executor: Running task 0.0 in stage 11.0 (TID 17)
24/03/01 11:21:06 INFO Executor: Finished task 0.0 in stage 11.0 (TID 17). 861 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID 17) in 34 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:06 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 
24/03/01 11:21:06 INFO DAGScheduler: ResultStage 11 (collect at HoodieSparkEngineContext.java:150) finished in 0.048 s
24/03/01 11:21:06 INFO DAGScheduler: Job 9 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:06 INFO TaskSchedulerImpl: Killing all running tasks in stage 11: Stage finished
24/03/01 11:21:06 INFO DAGScheduler: Job 9 finished: collect at HoodieSparkEngineContext.java:150, took 0.049345 s
24/03/01 11:21:06 INFO HoodieActiveTimeline: Marking instant complete [==>00000000000000010__deltacommit__INFLIGHT]
24/03/01 11:21:06 INFO HoodieActiveTimeline: Checking for file exists ?file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/00000000000000010.deltacommit.inflight
24/03/01 11:21:06 INFO HoodieActiveTimeline: Create new file for toInstant ?file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/00000000000000010.deltacommit
24/03/01 11:21:06 INFO HoodieActiveTimeline: Completed [==>00000000000000010__deltacommit__INFLIGHT]
24/03/01 11:21:06 INFO BaseSparkCommitActionExecutor: Committed 00000000000000010
24/03/01 11:21:06 INFO SparkContext: Starting job: collectAsMap at HoodieSparkEngineContext.java:164
24/03/01 11:21:06 INFO DAGScheduler: Got job 10 (collectAsMap at HoodieSparkEngineContext.java:164) with 1 output partitions
24/03/01 11:21:06 INFO DAGScheduler: Final stage: ResultStage 12 (collectAsMap at HoodieSparkEngineContext.java:164)
24/03/01 11:21:06 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:06 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:06 INFO DAGScheduler: Submitting ResultStage 12 (MapPartitionsRDD[33] at mapToPair at HoodieSparkEngineContext.java:161), which has no missing parents
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 100.7 KiB, free 433.5 MiB)
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 35.8 KiB, free 433.4 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on soumils-mbp:49385 (size: 35.8 KiB, free: 434.2 MiB)
24/03/01 11:21:06 INFO SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:06 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 12 (MapPartitionsRDD[33] at mapToPair at HoodieSparkEngineContext.java:161) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:06 INFO TaskSchedulerImpl: Adding task set 12.0 with 1 tasks resource profile 0
24/03/01 11:21:06 INFO TaskSetManager: Starting task 0.0 in stage 12.0 (TID 18) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7382 bytes) 
24/03/01 11:21:06 INFO Executor: Running task 0.0 in stage 12.0 (TID 18)
24/03/01 11:21:06 INFO Executor: Finished task 0.0 in stage 12.0 (TID 18). 910 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 0.0 in stage 12.0 (TID 18) in 5 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:06 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool 
24/03/01 11:21:06 INFO DAGScheduler: ResultStage 12 (collectAsMap at HoodieSparkEngineContext.java:164) finished in 0.014 s
24/03/01 11:21:06 INFO DAGScheduler: Job 10 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:06 INFO TaskSchedulerImpl: Killing all running tasks in stage 12: Stage finished
24/03/01 11:21:06 INFO DAGScheduler: Job 10 finished: collectAsMap at HoodieSparkEngineContext.java:164, took 0.015285 s
24/03/01 11:21:06 INFO FSUtils: Removed directory at file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/.temp/00000000000000010
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:06 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: MDT file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders partition FILES has been enabled
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
24/03/01 11:21:06 INFO ClusteringUtils: Found 0 files in pending clustering operations
24/03/01 11:21:06 INFO HoodieBackedTableMetadataWriter: Initializing FILES index in metadata table took 972 in ms
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:06 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:06 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieBackedTableMetadataWriter: Latest deltacommit time found is 00000000000000010, running clean operations.
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:06 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO BaseHoodieWriteClient: Cleaner started
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:06 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO BaseHoodieWriteClient: Scheduling cleaning at instant time :00000000000000010002
24/03/01 11:21:06 INFO FileSystemViewManager: Creating InMemory based view for basePath file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
24/03/01 11:21:06 INFO ClusteringUtils: Found 0 files in pending clustering operations
24/03/01 11:21:06 INFO CleanPlanner: No earliest commit to retain. No need to scan partitions !!
24/03/01 11:21:06 INFO CleanPlanActionExecutor: Nothing to clean here. It is already clean
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:06 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20240301112102951__commit__REQUESTED__20240301112105636]}
24/03/01 11:21:06 INFO BaseHoodieWriteClient: Scheduling table service COMPACT
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:06 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO BaseHoodieWriteClient: Scheduling compaction at instant time :00000000000000010001
24/03/01 11:21:06 INFO ScheduleCompactionActionExecutor: Checking if compaction needs to be run on file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading Active commit timeline for file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :MEMORY
24/03/01 11:21:06 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[==>20240301112102951__commit__REQUESTED__20240301112105636]}
24/03/01 11:21:06 INFO HoodieTimelineArchiver: No Instants to archive
24/03/01 11:21:06 INFO HoodieBackedTableMetadataWriter: All the table services operations on MDT completed successfully
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders
24/03/01 11:21:06 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieTableConfig: Loading table properties from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata/.hoodie/hoodie.properties
24/03/01 11:21:06 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from file:/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders/.hoodie/metadata
24/03/01 11:21:06 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[00000000000000010__deltacommit__COMPLETED__20240301112106710]}
24/03/01 11:21:06 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
24/03/01 11:21:06 INFO ClusteringUtils: Found 0 files in pending clustering operations
24/03/01 11:21:06 INFO FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
24/03/01 11:21:06 INFO FileSystemViewManager: Creating remote first table view
24/03/01 11:21:06 INFO AsyncCleanerService: The HoodieWriteClient is not configured to auto & async clean. Async clean service will not start.
24/03/01 11:21:06 INFO AsyncArchiveService: The HoodieWriteClient is not configured to auto & async archive. Async archive service will not start.
24/03/01 11:21:06 INFO SparkContext: Starting job: collect at HoodieJavaRDD.java:177
24/03/01 11:21:06 INFO DAGScheduler: Registering RDD 34 (mapToPair at HoodieJavaRDD.java:149) as input to shuffle 2
24/03/01 11:21:06 INFO DAGScheduler: Registering RDD 40 (distinct at HoodieJavaRDD.java:157) as input to shuffle 1
24/03/01 11:21:06 INFO DAGScheduler: Got job 11 (collect at HoodieJavaRDD.java:177) with 8 output partitions
24/03/01 11:21:06 INFO DAGScheduler: Final stage: ResultStage 15 (collect at HoodieJavaRDD.java:177)
24/03/01 11:21:06 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 14)
24/03/01 11:21:06 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 14)
24/03/01 11:21:06 INFO DAGScheduler: Submitting ShuffleMapStage 13 (MapPartitionsRDD[34] at mapToPair at HoodieJavaRDD.java:149), which has no missing parents
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 41.2 KiB, free 433.4 MiB)
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 19.9 KiB, free 433.4 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on soumils-mbp:49385 (size: 19.9 KiB, free: 434.2 MiB)
24/03/01 11:21:06 INFO SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:06 INFO DAGScheduler: Submitting 8 missing tasks from ShuffleMapStage 13 (MapPartitionsRDD[34] at mapToPair at HoodieJavaRDD.java:149) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
24/03/01 11:21:06 INFO TaskSchedulerImpl: Adding task set 13.0 with 8 tasks resource profile 0
24/03/01 11:21:06 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID 19) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7999 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 1.0 in stage 13.0 (TID 20) (soumils-mbp, executor driver, partition 1, PROCESS_LOCAL, 7999 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 2.0 in stage 13.0 (TID 21) (soumils-mbp, executor driver, partition 2, PROCESS_LOCAL, 7999 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 3.0 in stage 13.0 (TID 22) (soumils-mbp, executor driver, partition 3, PROCESS_LOCAL, 7999 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 4.0 in stage 13.0 (TID 23) (soumils-mbp, executor driver, partition 4, PROCESS_LOCAL, 7999 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 5.0 in stage 13.0 (TID 24) (soumils-mbp, executor driver, partition 5, PROCESS_LOCAL, 7999 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 6.0 in stage 13.0 (TID 25) (soumils-mbp, executor driver, partition 6, PROCESS_LOCAL, 7999 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 7.0 in stage 13.0 (TID 26) (soumils-mbp, executor driver, partition 7, PROCESS_LOCAL, 7999 bytes) 
24/03/01 11:21:06 INFO Executor: Running task 0.0 in stage 13.0 (TID 19)
24/03/01 11:21:06 INFO Executor: Running task 1.0 in stage 13.0 (TID 20)
24/03/01 11:21:06 INFO Executor: Running task 6.0 in stage 13.0 (TID 25)
24/03/01 11:21:06 INFO Executor: Running task 3.0 in stage 13.0 (TID 22)
24/03/01 11:21:06 INFO Executor: Running task 7.0 in stage 13.0 (TID 26)
24/03/01 11:21:06 INFO Executor: Running task 4.0 in stage 13.0 (TID 23)
24/03/01 11:21:06 INFO Executor: Running task 2.0 in stage 13.0 (TID 21)
24/03/01 11:21:06 INFO Executor: Running task 5.0 in stage 13.0 (TID 24)
24/03/01 11:21:06 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/ea34b272-7f0f-4d95-aa7c-34f396bf146a_orders.csv, range: 0-13127, partition values: [empty row]
24/03/01 11:21:06 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/42042af4-b4e3-49e0-8441-15f82f94fa50_orders.csv, range: 0-13188, partition values: [empty row]
24/03/01 11:21:06 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/ad3608d1-28f6-418b-85d8-08272ec3a9b4_orders.csv, range: 0-13199, partition values: [empty row]
24/03/01 11:21:06 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/7fc6265c-04b1-4b4c-8ddb-62cc4455c028_orders.csv, range: 0-13171, partition values: [empty row]
24/03/01 11:21:06 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/77a2bb4b-494b-4f69-a277-37bc49be8b29_orders.csv, range: 0-13169, partition values: [empty row]
24/03/01 11:21:06 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/aac46b74-8dbc-4ce4-8cc8-5925717c8018_orders.csv, range: 0-13135, partition values: [empty row]
24/03/01 11:21:06 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/d75b3ff2-aee7-4cd5-9da1-93d05b85b86b_orders.csv, range: 0-13248, partition values: [empty row]
24/03/01 11:21:06 INFO FileScanRDD: Reading File path: file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders/bd14f873-5e9e-41b4-a767-830823f49e97_orders.csv, range: 0-13076, partition values: [empty row]
24/03/01 11:21:06 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:06 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:06 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:06 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:06 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:06 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:06 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:06 INFO HoodieSparkKeyGeneratorFactory: The value of hoodie.datasource.write.keygenerator.type is empty; inferred to be SIMPLE
24/03/01 11:21:06 INFO Executor: Finished task 3.0 in stage 13.0 (TID 22). 1463 bytes result sent to driver
24/03/01 11:21:06 INFO Executor: Finished task 1.0 in stage 13.0 (TID 20). 1463 bytes result sent to driver
24/03/01 11:21:06 INFO Executor: Finished task 4.0 in stage 13.0 (TID 23). 1463 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 3.0 in stage 13.0 (TID 22) in 128 ms on soumils-mbp (executor driver) (1/8)
24/03/01 11:21:06 INFO TaskSetManager: Finished task 1.0 in stage 13.0 (TID 20) in 130 ms on soumils-mbp (executor driver) (2/8)
24/03/01 11:21:06 INFO TaskSetManager: Finished task 4.0 in stage 13.0 (TID 23) in 130 ms on soumils-mbp (executor driver) (3/8)
24/03/01 11:21:06 INFO Executor: Finished task 6.0 in stage 13.0 (TID 25). 1506 bytes result sent to driver
24/03/01 11:21:06 INFO Executor: Finished task 5.0 in stage 13.0 (TID 24). 1463 bytes result sent to driver
24/03/01 11:21:06 INFO Executor: Finished task 7.0 in stage 13.0 (TID 26). 1506 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 6.0 in stage 13.0 (TID 25) in 132 ms on soumils-mbp (executor driver) (4/8)
24/03/01 11:21:06 INFO Executor: Finished task 0.0 in stage 13.0 (TID 19). 1463 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 7.0 in stage 13.0 (TID 26) in 132 ms on soumils-mbp (executor driver) (5/8)
24/03/01 11:21:06 INFO TaskSetManager: Finished task 0.0 in stage 13.0 (TID 19) in 134 ms on soumils-mbp (executor driver) (6/8)
24/03/01 11:21:06 INFO TaskSetManager: Finished task 5.0 in stage 13.0 (TID 24) in 134 ms on soumils-mbp (executor driver) (7/8)
24/03/01 11:21:06 INFO Executor: Finished task 2.0 in stage 13.0 (TID 21). 1506 bytes result sent to driver
24/03/01 11:21:06 INFO TaskSetManager: Finished task 2.0 in stage 13.0 (TID 21) in 134 ms on soumils-mbp (executor driver) (8/8)
24/03/01 11:21:06 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool 
24/03/01 11:21:06 INFO DAGScheduler: ShuffleMapStage 13 (mapToPair at HoodieJavaRDD.java:149) finished in 0.139 s
24/03/01 11:21:06 INFO DAGScheduler: looking for newly runnable stages
24/03/01 11:21:06 INFO DAGScheduler: running: Set()
24/03/01 11:21:06 INFO DAGScheduler: waiting: Set(ResultStage 15, ShuffleMapStage 14)
24/03/01 11:21:06 INFO DAGScheduler: failed: Set()
24/03/01 11:21:06 INFO DAGScheduler: Submitting ShuffleMapStage 14 (MapPartitionsRDD[40] at distinct at HoodieJavaRDD.java:157), which has no missing parents
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 26.5 KiB, free 433.3 MiB)
24/03/01 11:21:06 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 12.5 KiB, free 433.3 MiB)
24/03/01 11:21:06 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory on soumils-mbp:49385 (size: 12.5 KiB, free: 434.1 MiB)
24/03/01 11:21:06 INFO SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:06 INFO DAGScheduler: Submitting 8 missing tasks from ShuffleMapStage 14 (MapPartitionsRDD[40] at distinct at HoodieJavaRDD.java:157) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
24/03/01 11:21:06 INFO TaskSchedulerImpl: Adding task set 14.0 with 8 tasks resource profile 0
24/03/01 11:21:06 INFO TaskSetManager: Starting task 0.0 in stage 14.0 (TID 27) (soumils-mbp, executor driver, partition 0, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 1.0 in stage 14.0 (TID 28) (soumils-mbp, executor driver, partition 1, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 2.0 in stage 14.0 (TID 29) (soumils-mbp, executor driver, partition 2, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 3.0 in stage 14.0 (TID 30) (soumils-mbp, executor driver, partition 3, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 4.0 in stage 14.0 (TID 31) (soumils-mbp, executor driver, partition 4, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 5.0 in stage 14.0 (TID 32) (soumils-mbp, executor driver, partition 5, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 6.0 in stage 14.0 (TID 33) (soumils-mbp, executor driver, partition 6, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:06 INFO TaskSetManager: Starting task 7.0 in stage 14.0 (TID 34) (soumils-mbp, executor driver, partition 7, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:06 INFO Executor: Running task 1.0 in stage 14.0 (TID 28)
24/03/01 11:21:06 INFO Executor: Running task 0.0 in stage 14.0 (TID 27)
24/03/01 11:21:06 INFO Executor: Running task 2.0 in stage 14.0 (TID 29)
24/03/01 11:21:06 INFO Executor: Running task 4.0 in stage 14.0 (TID 31)
24/03/01 11:21:06 INFO Executor: Running task 3.0 in stage 14.0 (TID 30)
24/03/01 11:21:06 INFO Executor: Running task 5.0 in stage 14.0 (TID 32)
24/03/01 11:21:06 INFO Executor: Running task 7.0 in stage 14.0 (TID 34)
24/03/01 11:21:06 INFO Executor: Running task 6.0 in stage 14.0 (TID 33)
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 8 (14.3 KiB) non-empty blocks including 8 (14.3 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 8 (13.7 KiB) non-empty blocks including 8 (13.7 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 8 (13.8 KiB) non-empty blocks including 8 (13.8 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 8 (14.0 KiB) non-empty blocks including 8 (14.0 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 8 (13.9 KiB) non-empty blocks including 8 (13.9 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 8 (15.7 KiB) non-empty blocks including 8 (15.7 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 8 (13.6 KiB) non-empty blocks including 8 (13.6 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Getting 8 (13.6 KiB) non-empty blocks including 8 (13.6 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
24/03/01 11:21:07 INFO MemoryStore: Block rdd_36_2 stored as values in memory (estimated size 18.1 KiB, free 433.3 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_36_2 in memory on soumils-mbp:49385 (size: 18.1 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_36_5 stored as values in memory (estimated size 17.4 KiB, free 433.3 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_36_1 stored as values in memory (estimated size 17.7 KiB, free 433.2 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_36_5 in memory on soumils-mbp:49385 (size: 17.4 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_36_0 stored as values in memory (estimated size 17.7 KiB, free 433.2 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_36_6 stored as values in memory (estimated size 17.2 KiB, free 433.2 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_36_7 stored as values in memory (estimated size 20.0 KiB, free 433.2 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_36_1 in memory on soumils-mbp:49385 (size: 17.7 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_36_4 stored as values in memory (estimated size 17.9 KiB, free 433.2 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_36_3 stored as values in memory (estimated size 17.2 KiB, free 433.2 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_36_6 in memory on soumils-mbp:49385 (size: 17.2 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_36_0 in memory on soumils-mbp:49385 (size: 17.7 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_36_7 in memory on soumils-mbp:49385 (size: 20.0 KiB, free: 434.0 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_36_4 in memory on soumils-mbp:49385 (size: 17.9 KiB, free: 434.0 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_36_3 in memory on soumils-mbp:49385 (size: 17.2 KiB, free: 434.0 MiB)
24/03/01 11:21:07 INFO Executor: Finished task 2.0 in stage 14.0 (TID 29). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 2.0 in stage 14.0 (TID 29) in 98 ms on soumils-mbp (executor driver) (1/8)
24/03/01 11:21:07 INFO Executor: Finished task 1.0 in stage 14.0 (TID 28). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 5.0 in stage 14.0 (TID 32). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 1.0 in stage 14.0 (TID 28) in 100 ms on soumils-mbp (executor driver) (2/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 5.0 in stage 14.0 (TID 32) in 100 ms on soumils-mbp (executor driver) (3/8)
24/03/01 11:21:07 INFO Executor: Finished task 0.0 in stage 14.0 (TID 27). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 0.0 in stage 14.0 (TID 27) in 102 ms on soumils-mbp (executor driver) (4/8)
24/03/01 11:21:07 INFO Executor: Finished task 6.0 in stage 14.0 (TID 33). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 6.0 in stage 14.0 (TID 33) in 102 ms on soumils-mbp (executor driver) (5/8)
24/03/01 11:21:07 INFO Executor: Finished task 7.0 in stage 14.0 (TID 34). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 7.0 in stage 14.0 (TID 34) in 102 ms on soumils-mbp (executor driver) (6/8)
24/03/01 11:21:07 INFO Executor: Finished task 3.0 in stage 14.0 (TID 30). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 3.0 in stage 14.0 (TID 30) in 103 ms on soumils-mbp (executor driver) (7/8)
24/03/01 11:21:07 INFO Executor: Finished task 4.0 in stage 14.0 (TID 31). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 4.0 in stage 14.0 (TID 31) in 103 ms on soumils-mbp (executor driver) (8/8)
24/03/01 11:21:07 INFO TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool 
24/03/01 11:21:07 INFO DAGScheduler: ShuffleMapStage 14 (distinct at HoodieJavaRDD.java:157) finished in 0.109 s
24/03/01 11:21:07 INFO DAGScheduler: looking for newly runnable stages
24/03/01 11:21:07 INFO DAGScheduler: running: Set()
24/03/01 11:21:07 INFO DAGScheduler: waiting: Set(ResultStage 15)
24/03/01 11:21:07 INFO DAGScheduler: failed: Set()
24/03/01 11:21:07 INFO DAGScheduler: Submitting ResultStage 15 (MapPartitionsRDD[42] at distinct at HoodieJavaRDD.java:157), which has no missing parents
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 6.3 KiB, free 433.2 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 3.5 KiB, free 433.2 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on soumils-mbp:49385 (size: 3.5 KiB, free: 434.0 MiB)
24/03/01 11:21:07 INFO SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:07 INFO DAGScheduler: Submitting 8 missing tasks from ResultStage 15 (MapPartitionsRDD[42] at distinct at HoodieJavaRDD.java:157) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
24/03/01 11:21:07 INFO TaskSchedulerImpl: Adding task set 15.0 with 8 tasks resource profile 0
24/03/01 11:21:07 INFO TaskSetManager: Starting task 0.0 in stage 15.0 (TID 35) (soumils-mbp, executor driver, partition 0, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 1.0 in stage 15.0 (TID 36) (soumils-mbp, executor driver, partition 1, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 2.0 in stage 15.0 (TID 37) (soumils-mbp, executor driver, partition 2, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 3.0 in stage 15.0 (TID 38) (soumils-mbp, executor driver, partition 3, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 4.0 in stage 15.0 (TID 39) (soumils-mbp, executor driver, partition 4, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 5.0 in stage 15.0 (TID 40) (soumils-mbp, executor driver, partition 5, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 6.0 in stage 15.0 (TID 41) (soumils-mbp, executor driver, partition 6, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 7.0 in stage 15.0 (TID 42) (soumils-mbp, executor driver, partition 7, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 1.0 in stage 15.0 (TID 36)
24/03/01 11:21:07 INFO Executor: Running task 0.0 in stage 15.0 (TID 35)
24/03/01 11:21:07 INFO Executor: Running task 2.0 in stage 15.0 (TID 37)
24/03/01 11:21:07 INFO Executor: Running task 3.0 in stage 15.0 (TID 38)
24/03/01 11:21:07 INFO Executor: Running task 6.0 in stage 15.0 (TID 41)
24/03/01 11:21:07 INFO Executor: Running task 5.0 in stage 15.0 (TID 40)
24/03/01 11:21:07 INFO Executor: Running task 4.0 in stage 15.0 (TID 39)
24/03/01 11:21:07 INFO Executor: Running task 7.0 in stage 15.0 (TID 42)
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (640.0 B) non-empty blocks including 8 (640.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (821.0 B) non-empty blocks including 8 (821.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (656.0 B) non-empty blocks including 8 (656.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (749.0 B) non-empty blocks including 8 (749.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (803.0 B) non-empty blocks including 8 (803.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (648.0 B) non-empty blocks including 8 (648.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (664.0 B) non-empty blocks including 8 (664.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (750.0 B) non-empty blocks including 8 (750.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO Executor: Finished task 5.0 in stage 15.0 (TID 40). 1789 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 1.0 in stage 15.0 (TID 36). 1729 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 7.0 in stage 15.0 (TID 42). 1765 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 2.0 in stage 15.0 (TID 37). 1729 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 5.0 in stage 15.0 (TID 40) in 7 ms on soumils-mbp (executor driver) (1/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 1.0 in stage 15.0 (TID 36) in 7 ms on soumils-mbp (executor driver) (2/8)
24/03/01 11:21:07 INFO Executor: Finished task 4.0 in stage 15.0 (TID 39). 1765 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 0.0 in stage 15.0 (TID 35). 1784 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 2.0 in stage 15.0 (TID 37) in 8 ms on soumils-mbp (executor driver) (3/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 4.0 in stage 15.0 (TID 39) in 8 ms on soumils-mbp (executor driver) (4/8)
24/03/01 11:21:07 INFO Executor: Finished task 6.0 in stage 15.0 (TID 41). 1820 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 3.0 in stage 15.0 (TID 38). 1772 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 7.0 in stage 15.0 (TID 42) in 9 ms on soumils-mbp (executor driver) (5/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 0.0 in stage 15.0 (TID 35) in 11 ms on soumils-mbp (executor driver) (6/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 6.0 in stage 15.0 (TID 41) in 10 ms on soumils-mbp (executor driver) (7/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 3.0 in stage 15.0 (TID 38) in 10 ms on soumils-mbp (executor driver) (8/8)
24/03/01 11:21:07 INFO TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool 
24/03/01 11:21:07 INFO DAGScheduler: ResultStage 15 (collect at HoodieJavaRDD.java:177) finished in 0.014 s
24/03/01 11:21:07 INFO DAGScheduler: Job 11 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:07 INFO TaskSchedulerImpl: Killing all running tasks in stage 15: Stage finished
24/03/01 11:21:07 INFO DAGScheduler: Job 11 finished: collect at HoodieJavaRDD.java:177, took 0.267255 s
24/03/01 11:21:07 INFO SparkContext: Starting job: collect at HoodieSparkEngineContext.java:150
24/03/01 11:21:07 INFO DAGScheduler: Got job 12 (collect at HoodieSparkEngineContext.java:150) with 56 output partitions
24/03/01 11:21:07 INFO DAGScheduler: Final stage: ResultStage 16 (collect at HoodieSparkEngineContext.java:150)
24/03/01 11:21:07 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:07 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:07 INFO DAGScheduler: Submitting ResultStage 16 (MapPartitionsRDD[44] at flatMap at HoodieSparkEngineContext.java:150), which has no missing parents
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_18 stored as values in memory (estimated size 449.8 KiB, free 432.7 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_18_piece0 stored as bytes in memory (estimated size 157.2 KiB, free 432.6 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added broadcast_18_piece0 in memory on soumils-mbp:49385 (size: 157.2 KiB, free: 433.8 MiB)
24/03/01 11:21:07 INFO SparkContext: Created broadcast 18 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:07 INFO DAGScheduler: Submitting 56 missing tasks from ResultStage 16 (MapPartitionsRDD[44] at flatMap at HoodieSparkEngineContext.java:150) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
24/03/01 11:21:07 INFO TaskSchedulerImpl: Adding task set 16.0 with 56 tasks resource profile 0
24/03/01 11:21:07 INFO TaskSetManager: Starting task 0.0 in stage 16.0 (TID 43) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 1.0 in stage 16.0 (TID 44) (soumils-mbp, executor driver, partition 1, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 2.0 in stage 16.0 (TID 45) (soumils-mbp, executor driver, partition 2, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 3.0 in stage 16.0 (TID 46) (soumils-mbp, executor driver, partition 3, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 4.0 in stage 16.0 (TID 47) (soumils-mbp, executor driver, partition 4, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 5.0 in stage 16.0 (TID 48) (soumils-mbp, executor driver, partition 5, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 6.0 in stage 16.0 (TID 49) (soumils-mbp, executor driver, partition 6, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 7.0 in stage 16.0 (TID 50) (soumils-mbp, executor driver, partition 7, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 8.0 in stage 16.0 (TID 51) (soumils-mbp, executor driver, partition 8, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 9.0 in stage 16.0 (TID 52) (soumils-mbp, executor driver, partition 9, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 10.0 in stage 16.0 (TID 53) (soumils-mbp, executor driver, partition 10, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 11.0 in stage 16.0 (TID 54) (soumils-mbp, executor driver, partition 11, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 1.0 in stage 16.0 (TID 44)
24/03/01 11:21:07 INFO Executor: Running task 4.0 in stage 16.0 (TID 47)
24/03/01 11:21:07 INFO Executor: Running task 3.0 in stage 16.0 (TID 46)
24/03/01 11:21:07 INFO Executor: Running task 0.0 in stage 16.0 (TID 43)
24/03/01 11:21:07 INFO Executor: Running task 5.0 in stage 16.0 (TID 48)
24/03/01 11:21:07 INFO Executor: Running task 2.0 in stage 16.0 (TID 45)
24/03/01 11:21:07 INFO Executor: Running task 7.0 in stage 16.0 (TID 50)
24/03/01 11:21:07 INFO Executor: Running task 6.0 in stage 16.0 (TID 49)
24/03/01 11:21:07 INFO Executor: Running task 8.0 in stage 16.0 (TID 51)
24/03/01 11:21:07 INFO Executor: Running task 9.0 in stage 16.0 (TID 52)
24/03/01 11:21:07 INFO Executor: Running task 11.0 in stage 16.0 (TID 54)
24/03/01 11:21:07 INFO Executor: Running task 10.0 in stage 16.0 (TID 53)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_13_piece0 on soumils-mbp:49385 in memory (size: 35.8 KiB, free: 433.9 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_17_piece0 on soumils-mbp:49385 in memory (size: 3.5 KiB, free: 433.9 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_14_piece0 on soumils-mbp:49385 in memory (size: 35.8 KiB, free: 433.9 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_15_piece0 on soumils-mbp:49385 in memory (size: 19.9 KiB, free: 433.9 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_12_piece0 on soumils-mbp:49385 in memory (size: 123.8 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_16_piece0 on soumils-mbp:49385 in memory (size: 12.5 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManager: Removing RDD 28
24/03/01 11:21:07 INFO Executor: Finished task 5.0 in stage 16.0 (TID 48). 809 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 1.0 in stage 16.0 (TID 44). 852 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 3.0 in stage 16.0 (TID 46). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 12.0 in stage 16.0 (TID 55) (soumils-mbp, executor driver, partition 12, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 13.0 in stage 16.0 (TID 56) (soumils-mbp, executor driver, partition 13, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 12.0 in stage 16.0 (TID 55)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 14.0 in stage 16.0 (TID 57) (soumils-mbp, executor driver, partition 14, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 13.0 in stage 16.0 (TID 56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 1.0 in stage 16.0 (TID 44) in 93 ms on soumils-mbp (executor driver) (1/56)
24/03/01 11:21:07 INFO Executor: Running task 14.0 in stage 16.0 (TID 57)
24/03/01 11:21:07 INFO Executor: Finished task 8.0 in stage 16.0 (TID 51). 809 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 10.0 in stage 16.0 (TID 53). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 5.0 in stage 16.0 (TID 48) in 93 ms on soumils-mbp (executor driver) (2/56)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 15.0 in stage 16.0 (TID 58) (soumils-mbp, executor driver, partition 15, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 16.0 in stage 16.0 (TID 59) (soumils-mbp, executor driver, partition 16, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 15.0 in stage 16.0 (TID 58)
24/03/01 11:21:07 INFO Executor: Running task 16.0 in stage 16.0 (TID 59)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 10.0 in stage 16.0 (TID 53) in 93 ms on soumils-mbp (executor driver) (3/56)
24/03/01 11:21:07 INFO Executor: Finished task 6.0 in stage 16.0 (TID 49). 809 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 0.0 in stage 16.0 (TID 43). 895 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 11.0 in stage 16.0 (TID 54). 852 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 9.0 in stage 16.0 (TID 52). 852 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 4.0 in stage 16.0 (TID 47). 895 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 2.0 in stage 16.0 (TID 45). 852 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 17.0 in stage 16.0 (TID 60) (soumils-mbp, executor driver, partition 17, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 17.0 in stage 16.0 (TID 60)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 18.0 in stage 16.0 (TID 61) (soumils-mbp, executor driver, partition 18, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Finished task 7.0 in stage 16.0 (TID 50). 852 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 3.0 in stage 16.0 (TID 46) in 98 ms on soumils-mbp (executor driver) (4/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 8.0 in stage 16.0 (TID 51) in 98 ms on soumils-mbp (executor driver) (5/56)
24/03/01 11:21:07 INFO Executor: Running task 18.0 in stage 16.0 (TID 61)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 0.0 in stage 16.0 (TID 43) in 99 ms on soumils-mbp (executor driver) (6/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 6.0 in stage 16.0 (TID 49) in 98 ms on soumils-mbp (executor driver) (7/56)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 19.0 in stage 16.0 (TID 62) (soumils-mbp, executor driver, partition 19, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 19.0 in stage 16.0 (TID 62)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 20.0 in stage 16.0 (TID 63) (soumils-mbp, executor driver, partition 20, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 11.0 in stage 16.0 (TID 54) in 98 ms on soumils-mbp (executor driver) (8/56)
24/03/01 11:21:07 INFO Executor: Running task 20.0 in stage 16.0 (TID 63)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 21.0 in stage 16.0 (TID 64) (soumils-mbp, executor driver, partition 21, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 4.0 in stage 16.0 (TID 47) in 99 ms on soumils-mbp (executor driver) (9/56)
24/03/01 11:21:07 INFO Executor: Running task 21.0 in stage 16.0 (TID 64)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 9.0 in stage 16.0 (TID 52) in 99 ms on soumils-mbp (executor driver) (10/56)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 22.0 in stage 16.0 (TID 65) (soumils-mbp, executor driver, partition 22, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 22.0 in stage 16.0 (TID 65)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 23.0 in stage 16.0 (TID 66) (soumils-mbp, executor driver, partition 23, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 7.0 in stage 16.0 (TID 50) in 100 ms on soumils-mbp (executor driver) (11/56)
24/03/01 11:21:07 INFO Executor: Running task 23.0 in stage 16.0 (TID 66)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 2.0 in stage 16.0 (TID 45) in 101 ms on soumils-mbp (executor driver) (12/56)
24/03/01 11:21:07 INFO Executor: Finished task 12.0 in stage 16.0 (TID 55). 766 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 14.0 in stage 16.0 (TID 57). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 24.0 in stage 16.0 (TID 67) (soumils-mbp, executor driver, partition 24, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 24.0 in stage 16.0 (TID 67)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 25.0 in stage 16.0 (TID 68) (soumils-mbp, executor driver, partition 25, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 14.0 in stage 16.0 (TID 57) in 37 ms on soumils-mbp (executor driver) (13/56)
24/03/01 11:21:07 INFO Executor: Running task 25.0 in stage 16.0 (TID 68)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 12.0 in stage 16.0 (TID 55) in 37 ms on soumils-mbp (executor driver) (14/56)
24/03/01 11:21:07 INFO Executor: Finished task 15.0 in stage 16.0 (TID 58). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 26.0 in stage 16.0 (TID 69) (soumils-mbp, executor driver, partition 26, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 15.0 in stage 16.0 (TID 58) in 48 ms on soumils-mbp (executor driver) (15/56)
24/03/01 11:21:07 INFO Executor: Running task 26.0 in stage 16.0 (TID 69)
24/03/01 11:21:07 INFO Executor: Finished task 19.0 in stage 16.0 (TID 62). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 27.0 in stage 16.0 (TID 70) (soumils-mbp, executor driver, partition 27, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 27.0 in stage 16.0 (TID 70)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 19.0 in stage 16.0 (TID 62) in 47 ms on soumils-mbp (executor driver) (16/56)
24/03/01 11:21:07 INFO Executor: Finished task 18.0 in stage 16.0 (TID 61). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 28.0 in stage 16.0 (TID 71) (soumils-mbp, executor driver, partition 28, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 18.0 in stage 16.0 (TID 61) in 50 ms on soumils-mbp (executor driver) (17/56)
24/03/01 11:21:07 INFO Executor: Running task 28.0 in stage 16.0 (TID 71)
24/03/01 11:21:07 INFO Executor: Finished task 16.0 in stage 16.0 (TID 59). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 29.0 in stage 16.0 (TID 72) (soumils-mbp, executor driver, partition 29, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 16.0 in stage 16.0 (TID 59) in 54 ms on soumils-mbp (executor driver) (18/56)
24/03/01 11:21:07 INFO Executor: Running task 29.0 in stage 16.0 (TID 72)
24/03/01 11:21:07 INFO Executor: Finished task 22.0 in stage 16.0 (TID 65). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 30.0 in stage 16.0 (TID 73) (soumils-mbp, executor driver, partition 30, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 22.0 in stage 16.0 (TID 65) in 49 ms on soumils-mbp (executor driver) (19/56)
24/03/01 11:21:07 INFO Executor: Running task 30.0 in stage 16.0 (TID 73)
24/03/01 11:21:07 INFO Executor: Finished task 13.0 in stage 16.0 (TID 56). 766 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 17.0 in stage 16.0 (TID 60). 766 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 20.0 in stage 16.0 (TID 63). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 31.0 in stage 16.0 (TID 74) (soumils-mbp, executor driver, partition 31, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 31.0 in stage 16.0 (TID 74)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 32.0 in stage 16.0 (TID 75) (soumils-mbp, executor driver, partition 32, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Finished task 23.0 in stage 16.0 (TID 66). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 33.0 in stage 16.0 (TID 76) (soumils-mbp, executor driver, partition 33, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 32.0 in stage 16.0 (TID 75)
24/03/01 11:21:07 INFO Executor: Running task 33.0 in stage 16.0 (TID 76)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 34.0 in stage 16.0 (TID 77) (soumils-mbp, executor driver, partition 34, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 17.0 in stage 16.0 (TID 60) in 58 ms on soumils-mbp (executor driver) (20/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 20.0 in stage 16.0 (TID 63) in 54 ms on soumils-mbp (executor driver) (21/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 13.0 in stage 16.0 (TID 56) in 60 ms on soumils-mbp (executor driver) (22/56)
24/03/01 11:21:07 INFO Executor: Running task 34.0 in stage 16.0 (TID 77)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 23.0 in stage 16.0 (TID 66) in 53 ms on soumils-mbp (executor driver) (23/56)
24/03/01 11:21:07 INFO Executor: Finished task 21.0 in stage 16.0 (TID 64). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 35.0 in stage 16.0 (TID 78) (soumils-mbp, executor driver, partition 35, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 21.0 in stage 16.0 (TID 64) in 59 ms on soumils-mbp (executor driver) (24/56)
24/03/01 11:21:07 INFO Executor: Running task 35.0 in stage 16.0 (TID 78)
24/03/01 11:21:07 INFO Executor: Finished task 25.0 in stage 16.0 (TID 68). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 36.0 in stage 16.0 (TID 79) (soumils-mbp, executor driver, partition 36, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 36.0 in stage 16.0 (TID 79)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 25.0 in stage 16.0 (TID 68) in 56 ms on soumils-mbp (executor driver) (25/56)
24/03/01 11:21:07 INFO Executor: Finished task 24.0 in stage 16.0 (TID 67). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 37.0 in stage 16.0 (TID 80) (soumils-mbp, executor driver, partition 37, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 24.0 in stage 16.0 (TID 67) in 60 ms on soumils-mbp (executor driver) (26/56)
24/03/01 11:21:07 INFO Executor: Running task 37.0 in stage 16.0 (TID 80)
24/03/01 11:21:07 INFO Executor: Finished task 26.0 in stage 16.0 (TID 69). 809 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 28.0 in stage 16.0 (TID 71). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 38.0 in stage 16.0 (TID 81) (soumils-mbp, executor driver, partition 38, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 39.0 in stage 16.0 (TID 82) (soumils-mbp, executor driver, partition 39, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Finished task 27.0 in stage 16.0 (TID 70). 809 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Running task 38.0 in stage 16.0 (TID 81)
24/03/01 11:21:07 INFO Executor: Running task 39.0 in stage 16.0 (TID 82)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 26.0 in stage 16.0 (TID 69) in 65 ms on soumils-mbp (executor driver) (27/56)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 40.0 in stage 16.0 (TID 83) (soumils-mbp, executor driver, partition 40, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 28.0 in stage 16.0 (TID 71) in 62 ms on soumils-mbp (executor driver) (28/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 27.0 in stage 16.0 (TID 70) in 64 ms on soumils-mbp (executor driver) (29/56)
24/03/01 11:21:07 INFO Executor: Running task 40.0 in stage 16.0 (TID 83)
24/03/01 11:21:07 INFO Executor: Finished task 29.0 in stage 16.0 (TID 72). 809 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 30.0 in stage 16.0 (TID 73). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 41.0 in stage 16.0 (TID 84) (soumils-mbp, executor driver, partition 41, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 42.0 in stage 16.0 (TID 85) (soumils-mbp, executor driver, partition 42, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 41.0 in stage 16.0 (TID 84)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 30.0 in stage 16.0 (TID 73) in 61 ms on soumils-mbp (executor driver) (30/56)
24/03/01 11:21:07 INFO Executor: Running task 42.0 in stage 16.0 (TID 85)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 29.0 in stage 16.0 (TID 72) in 63 ms on soumils-mbp (executor driver) (31/56)
24/03/01 11:21:07 INFO Executor: Finished task 33.0 in stage 16.0 (TID 76). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 43.0 in stage 16.0 (TID 86) (soumils-mbp, executor driver, partition 43, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 43.0 in stage 16.0 (TID 86)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 33.0 in stage 16.0 (TID 76) in 60 ms on soumils-mbp (executor driver) (32/56)
24/03/01 11:21:07 INFO Executor: Finished task 32.0 in stage 16.0 (TID 75). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 44.0 in stage 16.0 (TID 87) (soumils-mbp, executor driver, partition 44, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 32.0 in stage 16.0 (TID 75) in 65 ms on soumils-mbp (executor driver) (33/56)
24/03/01 11:21:07 INFO Executor: Running task 44.0 in stage 16.0 (TID 87)
24/03/01 11:21:07 INFO Executor: Finished task 31.0 in stage 16.0 (TID 74). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 45.0 in stage 16.0 (TID 88) (soumils-mbp, executor driver, partition 45, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 31.0 in stage 16.0 (TID 74) in 68 ms on soumils-mbp (executor driver) (34/56)
24/03/01 11:21:07 INFO Executor: Running task 45.0 in stage 16.0 (TID 88)
24/03/01 11:21:07 INFO Executor: Finished task 35.0 in stage 16.0 (TID 78). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 46.0 in stage 16.0 (TID 89) (soumils-mbp, executor driver, partition 46, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 46.0 in stage 16.0 (TID 89)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 35.0 in stage 16.0 (TID 78) in 66 ms on soumils-mbp (executor driver) (35/56)
24/03/01 11:21:07 INFO Executor: Finished task 34.0 in stage 16.0 (TID 77). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 47.0 in stage 16.0 (TID 90) (soumils-mbp, executor driver, partition 47, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 34.0 in stage 16.0 (TID 77) in 75 ms on soumils-mbp (executor driver) (36/56)
24/03/01 11:21:07 INFO Executor: Running task 47.0 in stage 16.0 (TID 90)
24/03/01 11:21:07 INFO Executor: Finished task 36.0 in stage 16.0 (TID 79). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 48.0 in stage 16.0 (TID 91) (soumils-mbp, executor driver, partition 48, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 36.0 in stage 16.0 (TID 79) in 58 ms on soumils-mbp (executor driver) (37/56)
24/03/01 11:21:07 INFO Executor: Running task 48.0 in stage 16.0 (TID 91)
24/03/01 11:21:07 INFO Executor: Finished task 37.0 in stage 16.0 (TID 80). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 49.0 in stage 16.0 (TID 92) (soumils-mbp, executor driver, partition 49, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 49.0 in stage 16.0 (TID 92)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 37.0 in stage 16.0 (TID 80) in 60 ms on soumils-mbp (executor driver) (38/56)
24/03/01 11:21:07 INFO Executor: Finished task 38.0 in stage 16.0 (TID 81). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 50.0 in stage 16.0 (TID 93) (soumils-mbp, executor driver, partition 50, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Finished task 42.0 in stage 16.0 (TID 85). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 38.0 in stage 16.0 (TID 81) in 47 ms on soumils-mbp (executor driver) (39/56)
24/03/01 11:21:07 INFO Executor: Running task 50.0 in stage 16.0 (TID 93)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 51.0 in stage 16.0 (TID 94) (soumils-mbp, executor driver, partition 51, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 42.0 in stage 16.0 (TID 85) in 45 ms on soumils-mbp (executor driver) (40/56)
24/03/01 11:21:07 INFO Executor: Running task 51.0 in stage 16.0 (TID 94)
24/03/01 11:21:07 INFO Executor: Finished task 40.0 in stage 16.0 (TID 83). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 52.0 in stage 16.0 (TID 95) (soumils-mbp, executor driver, partition 52, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 52.0 in stage 16.0 (TID 95)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 40.0 in stage 16.0 (TID 83) in 49 ms on soumils-mbp (executor driver) (41/56)
24/03/01 11:21:07 INFO Executor: Finished task 39.0 in stage 16.0 (TID 82). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 53.0 in stage 16.0 (TID 96) (soumils-mbp, executor driver, partition 53, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 39.0 in stage 16.0 (TID 82) in 53 ms on soumils-mbp (executor driver) (42/56)
24/03/01 11:21:07 INFO Executor: Running task 53.0 in stage 16.0 (TID 96)
24/03/01 11:21:07 INFO Executor: Finished task 41.0 in stage 16.0 (TID 84). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 54.0 in stage 16.0 (TID 97) (soumils-mbp, executor driver, partition 54, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 41.0 in stage 16.0 (TID 84) in 51 ms on soumils-mbp (executor driver) (43/56)
24/03/01 11:21:07 INFO Executor: Running task 54.0 in stage 16.0 (TID 97)
24/03/01 11:21:07 INFO Executor: Finished task 45.0 in stage 16.0 (TID 88). 766 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 43.0 in stage 16.0 (TID 86). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 55.0 in stage 16.0 (TID 98) (soumils-mbp, executor driver, partition 55, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 45.0 in stage 16.0 (TID 88) in 43 ms on soumils-mbp (executor driver) (44/56)
24/03/01 11:21:07 INFO Executor: Running task 55.0 in stage 16.0 (TID 98)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 43.0 in stage 16.0 (TID 86) in 51 ms on soumils-mbp (executor driver) (45/56)
24/03/01 11:21:07 INFO Executor: Finished task 44.0 in stage 16.0 (TID 87). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 44.0 in stage 16.0 (TID 87) in 47 ms on soumils-mbp (executor driver) (46/56)
24/03/01 11:21:07 INFO Executor: Finished task 46.0 in stage 16.0 (TID 89). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 46.0 in stage 16.0 (TID 89) in 46 ms on soumils-mbp (executor driver) (47/56)
24/03/01 11:21:07 INFO Executor: Finished task 47.0 in stage 16.0 (TID 90). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 47.0 in stage 16.0 (TID 90) in 49 ms on soumils-mbp (executor driver) (48/56)
24/03/01 11:21:07 INFO Executor: Finished task 48.0 in stage 16.0 (TID 91). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 48.0 in stage 16.0 (TID 91) in 40 ms on soumils-mbp (executor driver) (49/56)
24/03/01 11:21:07 INFO Executor: Finished task 49.0 in stage 16.0 (TID 92). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 49.0 in stage 16.0 (TID 92) in 38 ms on soumils-mbp (executor driver) (50/56)
24/03/01 11:21:07 INFO Executor: Finished task 52.0 in stage 16.0 (TID 95). 766 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 50.0 in stage 16.0 (TID 93). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 52.0 in stage 16.0 (TID 95) in 35 ms on soumils-mbp (executor driver) (51/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 50.0 in stage 16.0 (TID 93) in 37 ms on soumils-mbp (executor driver) (52/56)
24/03/01 11:21:07 INFO Executor: Finished task 51.0 in stage 16.0 (TID 94). 809 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 51.0 in stage 16.0 (TID 94) in 38 ms on soumils-mbp (executor driver) (53/56)
24/03/01 11:21:07 INFO Executor: Finished task 55.0 in stage 16.0 (TID 98). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 55.0 in stage 16.0 (TID 98) in 30 ms on soumils-mbp (executor driver) (54/56)
24/03/01 11:21:07 INFO Executor: Finished task 54.0 in stage 16.0 (TID 97). 766 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 53.0 in stage 16.0 (TID 96). 766 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 54.0 in stage 16.0 (TID 97) in 32 ms on soumils-mbp (executor driver) (55/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 53.0 in stage 16.0 (TID 96) in 33 ms on soumils-mbp (executor driver) (56/56)
24/03/01 11:21:07 INFO TaskSchedulerImpl: Removed TaskSet 16.0, whose tasks have all completed, from pool 
24/03/01 11:21:07 INFO DAGScheduler: ResultStage 16 (collect at HoodieSparkEngineContext.java:150) finished in 0.345 s
24/03/01 11:21:07 INFO DAGScheduler: Job 12 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:07 INFO TaskSchedulerImpl: Killing all running tasks in stage 16: Stage finished
24/03/01 11:21:07 INFO DAGScheduler: Job 12 finished: collect at HoodieSparkEngineContext.java:150, took 0.346630 s
24/03/01 11:21:07 INFO MapPartitionsRDD: Removing RDD 36 from persistence list
24/03/01 11:21:07 INFO BlockManager: Removing RDD 36
24/03/01 11:21:07 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
24/03/01 11:21:07 INFO ClusteringUtils: Found 0 files in pending clustering operations
24/03/01 11:21:07 INFO SparkContext: Starting job: countByKey at HoodieJavaPairRDD.java:105
24/03/01 11:21:07 INFO DAGScheduler: Registering RDD 37 (mapToPair at HoodieJavaRDD.java:149) as input to shuffle 3
24/03/01 11:21:07 INFO DAGScheduler: Registering RDD 47 (mapToPair at HoodieJavaRDD.java:149) as input to shuffle 4
24/03/01 11:21:07 INFO DAGScheduler: Registering RDD 55 (countByKey at HoodieJavaPairRDD.java:105) as input to shuffle 5
24/03/01 11:21:07 INFO DAGScheduler: Got job 13 (countByKey at HoodieJavaPairRDD.java:105) with 8 output partitions
24/03/01 11:21:07 INFO DAGScheduler: Final stage: ResultStage 21 (countByKey at HoodieJavaPairRDD.java:105)
24/03/01 11:21:07 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 20)
24/03/01 11:21:07 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 20)
24/03/01 11:21:07 INFO DAGScheduler: Submitting ShuffleMapStage 18 (MapPartitionsRDD[37] at mapToPair at HoodieJavaRDD.java:149), which has no missing parents
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_19 stored as values in memory (estimated size 25.4 KiB, free 433.6 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_19_piece0 stored as bytes in memory (estimated size 12.1 KiB, free 433.5 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added broadcast_19_piece0 in memory on soumils-mbp:49385 (size: 12.1 KiB, free: 434.2 MiB)
24/03/01 11:21:07 INFO SparkContext: Created broadcast 19 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:07 INFO DAGScheduler: Submitting 8 missing tasks from ShuffleMapStage 18 (MapPartitionsRDD[37] at mapToPair at HoodieJavaRDD.java:149) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
24/03/01 11:21:07 INFO TaskSchedulerImpl: Adding task set 18.0 with 8 tasks resource profile 0
24/03/01 11:21:07 INFO DAGScheduler: Submitting ShuffleMapStage 19 (MapPartitionsRDD[47] at mapToPair at HoodieJavaRDD.java:149), which has no missing parents
24/03/01 11:21:07 INFO TaskSetManager: Starting task 0.0 in stage 18.0 (TID 99) (soumils-mbp, executor driver, partition 0, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 1.0 in stage 18.0 (TID 100) (soumils-mbp, executor driver, partition 1, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 2.0 in stage 18.0 (TID 101) (soumils-mbp, executor driver, partition 2, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 3.0 in stage 18.0 (TID 102) (soumils-mbp, executor driver, partition 3, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 4.0 in stage 18.0 (TID 103) (soumils-mbp, executor driver, partition 4, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 5.0 in stage 18.0 (TID 104) (soumils-mbp, executor driver, partition 5, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 6.0 in stage 18.0 (TID 105) (soumils-mbp, executor driver, partition 6, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 7.0 in stage 18.0 (TID 106) (soumils-mbp, executor driver, partition 7, NODE_LOCAL, 7170 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 1.0 in stage 18.0 (TID 100)
24/03/01 11:21:07 INFO Executor: Running task 2.0 in stage 18.0 (TID 101)
24/03/01 11:21:07 INFO Executor: Running task 3.0 in stage 18.0 (TID 102)
24/03/01 11:21:07 INFO Executor: Running task 7.0 in stage 18.0 (TID 106)
24/03/01 11:21:07 INFO Executor: Running task 5.0 in stage 18.0 (TID 104)
24/03/01 11:21:07 INFO Executor: Running task 0.0 in stage 18.0 (TID 99)
24/03/01 11:21:07 INFO Executor: Running task 6.0 in stage 18.0 (TID 105)
24/03/01 11:21:07 INFO Executor: Running task 4.0 in stage 18.0 (TID 103)
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (13.9 KiB) non-empty blocks including 8 (13.9 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (14.3 KiB) non-empty blocks including 8 (14.3 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (13.6 KiB) non-empty blocks including 8 (13.6 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (13.7 KiB) non-empty blocks including 8 (13.7 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (13.8 KiB) non-empty blocks including 8 (13.8 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (14.0 KiB) non-empty blocks including 8 (14.0 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (13.6 KiB) non-empty blocks including 8 (13.6 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (15.7 KiB) non-empty blocks including 8 (15.7 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO Executor: Finished task 2.0 in stage 18.0 (TID 101). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 2.0 in stage 18.0 (TID 101) in 14 ms on soumils-mbp (executor driver) (1/8)
24/03/01 11:21:07 INFO Executor: Finished task 3.0 in stage 18.0 (TID 102). 1885 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 7.0 in stage 18.0 (TID 106). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 5.0 in stage 18.0 (TID 104). 1799 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 3.0 in stage 18.0 (TID 102) in 17 ms on soumils-mbp (executor driver) (2/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 7.0 in stage 18.0 (TID 106) in 17 ms on soumils-mbp (executor driver) (3/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 5.0 in stage 18.0 (TID 104) in 20 ms on soumils-mbp (executor driver) (4/8)
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_20 stored as values in memory (estimated size 451.8 KiB, free 433.1 MiB)
24/03/01 11:21:07 INFO Executor: Finished task 6.0 in stage 18.0 (TID 105). 1885 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 0.0 in stage 18.0 (TID 99). 1928 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 4.0 in stage 18.0 (TID 103). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 1.0 in stage 18.0 (TID 100). 1885 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 6.0 in stage 18.0 (TID 105) in 22 ms on soumils-mbp (executor driver) (5/8)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_18_piece0 on soumils-mbp:49385 in memory (size: 157.2 KiB, free: 434.4 MiB)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 4.0 in stage 18.0 (TID 103) in 22 ms on soumils-mbp (executor driver) (6/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 1.0 in stage 18.0 (TID 100) in 22 ms on soumils-mbp (executor driver) (7/8)
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 158.4 KiB, free 433.5 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added broadcast_20_piece0 in memory on soumils-mbp:49385 (size: 158.4 KiB, free: 434.2 MiB)
24/03/01 11:21:07 INFO SparkContext: Created broadcast 20 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:07 INFO TaskSetManager: Finished task 0.0 in stage 18.0 (TID 99) in 23 ms on soumils-mbp (executor driver) (8/8)
24/03/01 11:21:07 INFO TaskSchedulerImpl: Removed TaskSet 18.0, whose tasks have all completed, from pool 
24/03/01 11:21:07 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 19 (MapPartitionsRDD[47] at mapToPair at HoodieJavaRDD.java:149) (first 15 tasks are for partitions Vector(0))
24/03/01 11:21:07 INFO TaskSchedulerImpl: Adding task set 19.0 with 1 tasks resource profile 0
24/03/01 11:21:07 INFO TaskSetManager: Starting task 0.0 in stage 19.0 (TID 107) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7231 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 0.0 in stage 19.0 (TID 107)
24/03/01 11:21:07 INFO DAGScheduler: ShuffleMapStage 18 (mapToPair at HoodieJavaRDD.java:149) finished in 0.029 s
24/03/01 11:21:07 INFO DAGScheduler: looking for newly runnable stages
24/03/01 11:21:07 INFO DAGScheduler: running: Set(ShuffleMapStage 19)
24/03/01 11:21:07 INFO DAGScheduler: waiting: Set(ShuffleMapStage 20, ResultStage 21)
24/03/01 11:21:07 INFO DAGScheduler: failed: Set()
24/03/01 11:21:07 INFO Executor: Finished task 0.0 in stage 19.0 (TID 107). 810 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 0.0 in stage 19.0 (TID 107) in 11 ms on soumils-mbp (executor driver) (1/1)
24/03/01 11:21:07 INFO TaskSchedulerImpl: Removed TaskSet 19.0, whose tasks have all completed, from pool 
24/03/01 11:21:07 INFO DAGScheduler: ShuffleMapStage 19 (mapToPair at HoodieJavaRDD.java:149) finished in 0.034 s
24/03/01 11:21:07 INFO DAGScheduler: looking for newly runnable stages
24/03/01 11:21:07 INFO DAGScheduler: running: Set()
24/03/01 11:21:07 INFO DAGScheduler: waiting: Set(ShuffleMapStage 20, ResultStage 21)
24/03/01 11:21:07 INFO DAGScheduler: failed: Set()
24/03/01 11:21:07 INFO DAGScheduler: Submitting ShuffleMapStage 20 (MapPartitionsRDD[55] at countByKey at HoodieJavaPairRDD.java:105), which has no missing parents
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_21 stored as values in memory (estimated size 9.9 KiB, free 433.5 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_21_piece0 stored as bytes in memory (estimated size 5.1 KiB, free 433.5 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added broadcast_21_piece0 in memory on soumils-mbp:49385 (size: 5.1 KiB, free: 434.2 MiB)
24/03/01 11:21:07 INFO SparkContext: Created broadcast 21 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:07 INFO DAGScheduler: Submitting 8 missing tasks from ShuffleMapStage 20 (MapPartitionsRDD[55] at countByKey at HoodieJavaPairRDD.java:105) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
24/03/01 11:21:07 INFO TaskSchedulerImpl: Adding task set 20.0 with 8 tasks resource profile 0
24/03/01 11:21:07 INFO TaskSetManager: Starting task 0.0 in stage 20.0 (TID 108) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7233 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 1.0 in stage 20.0 (TID 109) (soumils-mbp, executor driver, partition 1, PROCESS_LOCAL, 7233 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 2.0 in stage 20.0 (TID 110) (soumils-mbp, executor driver, partition 2, PROCESS_LOCAL, 7233 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 3.0 in stage 20.0 (TID 111) (soumils-mbp, executor driver, partition 3, PROCESS_LOCAL, 7233 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 4.0 in stage 20.0 (TID 112) (soumils-mbp, executor driver, partition 4, PROCESS_LOCAL, 7233 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 5.0 in stage 20.0 (TID 113) (soumils-mbp, executor driver, partition 5, PROCESS_LOCAL, 7233 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 6.0 in stage 20.0 (TID 114) (soumils-mbp, executor driver, partition 6, PROCESS_LOCAL, 7233 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 7.0 in stage 20.0 (TID 115) (soumils-mbp, executor driver, partition 7, PROCESS_LOCAL, 7233 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 3.0 in stage 20.0 (TID 111)
24/03/01 11:21:07 INFO Executor: Running task 2.0 in stage 20.0 (TID 110)
24/03/01 11:21:07 INFO Executor: Running task 0.0 in stage 20.0 (TID 108)
24/03/01 11:21:07 INFO Executor: Running task 4.0 in stage 20.0 (TID 112)
24/03/01 11:21:07 INFO Executor: Running task 5.0 in stage 20.0 (TID 113)
24/03/01 11:21:07 INFO Executor: Running task 1.0 in stage 20.0 (TID 109)
24/03/01 11:21:07 INFO Executor: Running task 7.0 in stage 20.0 (TID 115)
24/03/01 11:21:07 INFO Executor: Running task 6.0 in stage 20.0 (TID 114)
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 1 (12.2 KiB) non-empty blocks including 1 (12.2 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 1 (12.2 KiB) non-empty blocks including 1 (12.2 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 1 (12.2 KiB) non-empty blocks including 1 (12.2 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 1 (12.2 KiB) non-empty blocks including 1 (12.2 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 1 (12.2 KiB) non-empty blocks including 1 (12.2 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 1 (13.5 KiB) non-empty blocks including 1 (13.5 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 1 (12.2 KiB) non-empty blocks including 1 (12.2 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 1 (14.8 KiB) non-empty blocks including 1 (14.8 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 0 (0.0 B) non-empty blocks including 0 (0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 0 (0.0 B) non-empty blocks including 0 (0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 0 (0.0 B) non-empty blocks including 0 (0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 0 (0.0 B) non-empty blocks including 0 (0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 0 (0.0 B) non-empty blocks including 0 (0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 0 (0.0 B) non-empty blocks including 0 (0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 0 (0.0 B) non-empty blocks including 0 (0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 0 (0.0 B) non-empty blocks including 0 (0.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO MemoryStore: Block rdd_53_0 stored as values in memory (estimated size 17.7 KiB, free 433.5 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_53_0 in memory on soumils-mbp:49385 (size: 17.7 KiB, free: 434.2 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_53_3 stored as values in memory (estimated size 17.2 KiB, free 433.5 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_53_2 stored as values in memory (estimated size 18.1 KiB, free 433.5 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_53_4 stored as values in memory (estimated size 17.9 KiB, free 433.4 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_53_6 stored as values in memory (estimated size 17.2 KiB, free 433.4 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_53_1 stored as values in memory (estimated size 17.7 KiB, free 433.4 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_53_5 stored as values in memory (estimated size 17.4 KiB, free 433.4 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block rdd_53_7 stored as values in memory (estimated size 20.0 KiB, free 433.4 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_53_3 in memory on soumils-mbp:49385 (size: 17.2 KiB, free: 434.2 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_53_5 in memory on soumils-mbp:49385 (size: 17.4 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_53_1 in memory on soumils-mbp:49385 (size: 17.7 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_53_7 in memory on soumils-mbp:49385 (size: 20.0 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_53_6 in memory on soumils-mbp:49385 (size: 17.2 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_53_4 in memory on soumils-mbp:49385 (size: 17.9 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added rdd_53_2 in memory on soumils-mbp:49385 (size: 18.1 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO Executor: Finished task 0.0 in stage 20.0 (TID 108). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 0.0 in stage 20.0 (TID 108) in 27 ms on soumils-mbp (executor driver) (1/8)
24/03/01 11:21:07 INFO Executor: Finished task 3.0 in stage 20.0 (TID 111). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 3.0 in stage 20.0 (TID 111) in 26 ms on soumils-mbp (executor driver) (2/8)
24/03/01 11:21:07 INFO Executor: Finished task 6.0 in stage 20.0 (TID 114). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 6.0 in stage 20.0 (TID 114) in 27 ms on soumils-mbp (executor driver) (3/8)
24/03/01 11:21:07 INFO Executor: Finished task 4.0 in stage 20.0 (TID 112). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 4.0 in stage 20.0 (TID 112) in 28 ms on soumils-mbp (executor driver) (4/8)
24/03/01 11:21:07 INFO Executor: Finished task 2.0 in stage 20.0 (TID 110). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 2.0 in stage 20.0 (TID 110) in 28 ms on soumils-mbp (executor driver) (5/8)
24/03/01 11:21:07 INFO Executor: Finished task 5.0 in stage 20.0 (TID 113). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 5.0 in stage 20.0 (TID 113) in 29 ms on soumils-mbp (executor driver) (6/8)
24/03/01 11:21:07 INFO Executor: Finished task 1.0 in stage 20.0 (TID 109). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 1.0 in stage 20.0 (TID 109) in 29 ms on soumils-mbp (executor driver) (7/8)
24/03/01 11:21:07 INFO Executor: Finished task 7.0 in stage 20.0 (TID 115). 1842 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 7.0 in stage 20.0 (TID 115) in 30 ms on soumils-mbp (executor driver) (8/8)
24/03/01 11:21:07 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks have all completed, from pool 
24/03/01 11:21:07 INFO DAGScheduler: ShuffleMapStage 20 (countByKey at HoodieJavaPairRDD.java:105) finished in 0.033 s
24/03/01 11:21:07 INFO DAGScheduler: looking for newly runnable stages
24/03/01 11:21:07 INFO DAGScheduler: running: Set()
24/03/01 11:21:07 INFO DAGScheduler: waiting: Set(ResultStage 21)
24/03/01 11:21:07 INFO DAGScheduler: failed: Set()
24/03/01 11:21:07 INFO DAGScheduler: Submitting ResultStage 21 (ShuffledRDD[56] at countByKey at HoodieJavaPairRDD.java:105), which has no missing parents
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_22 stored as values in memory (estimated size 5.5 KiB, free 433.4 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated size 3.2 KiB, free 433.4 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on soumils-mbp:49385 (size: 3.2 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO SparkContext: Created broadcast 22 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:07 INFO DAGScheduler: Submitting 8 missing tasks from ResultStage 21 (ShuffledRDD[56] at countByKey at HoodieJavaPairRDD.java:105) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7))
24/03/01 11:21:07 INFO TaskSchedulerImpl: Adding task set 21.0 with 8 tasks resource profile 0
24/03/01 11:21:07 INFO TaskSetManager: Starting task 0.0 in stage 21.0 (TID 116) (soumils-mbp, executor driver, partition 0, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 1.0 in stage 21.0 (TID 117) (soumils-mbp, executor driver, partition 1, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 2.0 in stage 21.0 (TID 118) (soumils-mbp, executor driver, partition 2, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 3.0 in stage 21.0 (TID 119) (soumils-mbp, executor driver, partition 3, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 4.0 in stage 21.0 (TID 120) (soumils-mbp, executor driver, partition 4, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 5.0 in stage 21.0 (TID 121) (soumils-mbp, executor driver, partition 5, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 6.0 in stage 21.0 (TID 122) (soumils-mbp, executor driver, partition 6, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 7.0 in stage 21.0 (TID 123) (soumils-mbp, executor driver, partition 7, NODE_LOCAL, 7181 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 2.0 in stage 21.0 (TID 118)
24/03/01 11:21:07 INFO Executor: Running task 1.0 in stage 21.0 (TID 117)
24/03/01 11:21:07 INFO Executor: Running task 6.0 in stage 21.0 (TID 122)
24/03/01 11:21:07 INFO Executor: Running task 3.0 in stage 21.0 (TID 119)
24/03/01 11:21:07 INFO Executor: Running task 7.0 in stage 21.0 (TID 123)
24/03/01 11:21:07 INFO Executor: Running task 5.0 in stage 21.0 (TID 121)
24/03/01 11:21:07 INFO Executor: Running task 0.0 in stage 21.0 (TID 116)
24/03/01 11:21:07 INFO Executor: Running task 4.0 in stage 21.0 (TID 120)
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (1072.0 B) non-empty blocks including 8 (1072.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (1012.0 B) non-empty blocks including 8 (1012.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (1425.0 B) non-empty blocks including 8 (1425.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (1566.0 B) non-empty blocks including 8 (1566.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (1410.0 B) non-empty blocks including 8 (1410.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (1338.0 B) non-empty blocks including 8 (1338.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (1032.0 B) non-empty blocks including 8 (1032.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Getting 8 (1745.0 B) non-empty blocks including 8 (1745.0 B) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
24/03/01 11:21:07 INFO Executor: Finished task 0.0 in stage 21.0 (TID 116). 1747 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 2.0 in stage 21.0 (TID 118). 1924 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 1.0 in stage 21.0 (TID 117). 1776 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 6.0 in stage 21.0 (TID 122). 2040 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 7.0 in stage 21.0 (TID 123). 1718 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 0.0 in stage 21.0 (TID 116) in 7 ms on soumils-mbp (executor driver) (1/8)
24/03/01 11:21:07 INFO Executor: Finished task 4.0 in stage 21.0 (TID 120). 1863 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 1.0 in stage 21.0 (TID 117) in 8 ms on soumils-mbp (executor driver) (2/8)
24/03/01 11:21:07 INFO Executor: Finished task 3.0 in stage 21.0 (TID 119). 1953 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 5.0 in stage 21.0 (TID 121). 1895 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 7.0 in stage 21.0 (TID 123) in 7 ms on soumils-mbp (executor driver) (3/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 6.0 in stage 21.0 (TID 122) in 8 ms on soumils-mbp (executor driver) (4/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 5.0 in stage 21.0 (TID 121) in 8 ms on soumils-mbp (executor driver) (5/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 2.0 in stage 21.0 (TID 118) in 9 ms on soumils-mbp (executor driver) (6/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 3.0 in stage 21.0 (TID 119) in 9 ms on soumils-mbp (executor driver) (7/8)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 4.0 in stage 21.0 (TID 120) in 8 ms on soumils-mbp (executor driver) (8/8)
24/03/01 11:21:07 INFO TaskSchedulerImpl: Removed TaskSet 21.0, whose tasks have all completed, from pool 
24/03/01 11:21:07 INFO DAGScheduler: ResultStage 21 (countByKey at HoodieJavaPairRDD.java:105) finished in 0.012 s
24/03/01 11:21:07 INFO DAGScheduler: Job 13 is finished. Cancelling potential speculative or zombie tasks for this job
24/03/01 11:21:07 INFO TaskSchedulerImpl: Killing all running tasks in stage 21: Stage finished
24/03/01 11:21:07 INFO DAGScheduler: Job 13 finished: countByKey at HoodieJavaPairRDD.java:105, took 0.089548 s
24/03/01 11:21:07 INFO UpsertPartitioner: AvgRecordSize => 1024
24/03/01 11:21:07 INFO SparkContext: Starting job: collectAsMap at UpsertPartitioner.java:282
24/03/01 11:21:07 INFO DAGScheduler: Got job 14 (collectAsMap at UpsertPartitioner.java:282) with 56 output partitions
24/03/01 11:21:07 INFO DAGScheduler: Final stage: ResultStage 22 (collectAsMap at UpsertPartitioner.java:282)
24/03/01 11:21:07 INFO DAGScheduler: Parents of final stage: List()
24/03/01 11:21:07 INFO DAGScheduler: Missing parents: List()
24/03/01 11:21:07 INFO DAGScheduler: Submitting ResultStage 22 (MapPartitionsRDD[58] at mapToPair at UpsertPartitioner.java:281), which has no missing parents
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_23 stored as values in memory (estimated size 455.1 KiB, free 432.9 MiB)
24/03/01 11:21:07 INFO MemoryStore: Block broadcast_23_piece0 stored as bytes in memory (estimated size 158.2 KiB, free 432.8 MiB)
24/03/01 11:21:07 INFO BlockManagerInfo: Added broadcast_23_piece0 in memory on soumils-mbp:49385 (size: 158.2 KiB, free: 433.9 MiB)
24/03/01 11:21:07 INFO SparkContext: Created broadcast 23 from broadcast at DAGScheduler.scala:1535
24/03/01 11:21:07 INFO DAGScheduler: Submitting 56 missing tasks from ResultStage 22 (MapPartitionsRDD[58] at mapToPair at UpsertPartitioner.java:281) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
24/03/01 11:21:07 INFO TaskSchedulerImpl: Adding task set 22.0 with 56 tasks resource profile 0
24/03/01 11:21:07 INFO TaskSetManager: Starting task 0.0 in stage 22.0 (TID 124) (soumils-mbp, executor driver, partition 0, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 1.0 in stage 22.0 (TID 125) (soumils-mbp, executor driver, partition 1, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 2.0 in stage 22.0 (TID 126) (soumils-mbp, executor driver, partition 2, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 3.0 in stage 22.0 (TID 127) (soumils-mbp, executor driver, partition 3, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 4.0 in stage 22.0 (TID 128) (soumils-mbp, executor driver, partition 4, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 5.0 in stage 22.0 (TID 129) (soumils-mbp, executor driver, partition 5, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 6.0 in stage 22.0 (TID 130) (soumils-mbp, executor driver, partition 6, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 7.0 in stage 22.0 (TID 131) (soumils-mbp, executor driver, partition 7, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 8.0 in stage 22.0 (TID 132) (soumils-mbp, executor driver, partition 8, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 9.0 in stage 22.0 (TID 133) (soumils-mbp, executor driver, partition 9, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 10.0 in stage 22.0 (TID 134) (soumils-mbp, executor driver, partition 10, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 11.0 in stage 22.0 (TID 135) (soumils-mbp, executor driver, partition 11, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 1.0 in stage 22.0 (TID 125)
24/03/01 11:21:07 INFO Executor: Running task 0.0 in stage 22.0 (TID 124)
24/03/01 11:21:07 INFO Executor: Running task 2.0 in stage 22.0 (TID 126)
24/03/01 11:21:07 INFO Executor: Running task 3.0 in stage 22.0 (TID 127)
24/03/01 11:21:07 INFO Executor: Running task 4.0 in stage 22.0 (TID 128)
24/03/01 11:21:07 INFO Executor: Running task 10.0 in stage 22.0 (TID 134)
24/03/01 11:21:07 INFO Executor: Running task 11.0 in stage 22.0 (TID 135)
24/03/01 11:21:07 INFO Executor: Running task 5.0 in stage 22.0 (TID 129)
24/03/01 11:21:07 INFO Executor: Running task 8.0 in stage 22.0 (TID 132)
24/03/01 11:21:07 INFO Executor: Running task 6.0 in stage 22.0 (TID 130)
24/03/01 11:21:07 INFO Executor: Running task 7.0 in stage 22.0 (TID 131)
24/03/01 11:21:07 INFO Executor: Running task 9.0 in stage 22.0 (TID 133)
24/03/01 11:21:07 INFO Executor: Finished task 4.0 in stage 22.0 (TID 128). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 12.0 in stage 22.0 (TID 136) (soumils-mbp, executor driver, partition 12, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 12.0 in stage 22.0 (TID 136)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 4.0 in stage 22.0 (TID 128) in 52 ms on soumils-mbp (executor driver) (1/56)
24/03/01 11:21:07 INFO Executor: Finished task 3.0 in stage 22.0 (TID 127). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 9.0 in stage 22.0 (TID 133). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 13.0 in stage 22.0 (TID 137) (soumils-mbp, executor driver, partition 13, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 14.0 in stage 22.0 (TID 138) (soumils-mbp, executor driver, partition 14, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 13.0 in stage 22.0 (TID 137)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 9.0 in stage 22.0 (TID 133) in 55 ms on soumils-mbp (executor driver) (2/56)
24/03/01 11:21:07 INFO Executor: Running task 14.0 in stage 22.0 (TID 138)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 3.0 in stage 22.0 (TID 127) in 55 ms on soumils-mbp (executor driver) (3/56)
24/03/01 11:21:07 INFO Executor: Finished task 7.0 in stage 22.0 (TID 131). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 15.0 in stage 22.0 (TID 139) (soumils-mbp, executor driver, partition 15, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Finished task 8.0 in stage 22.0 (TID 132). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 1.0 in stage 22.0 (TID 125). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Running task 15.0 in stage 22.0 (TID 139)
24/03/01 11:21:07 INFO Executor: Finished task 11.0 in stage 22.0 (TID 135). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 0.0 in stage 22.0 (TID 124). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 2.0 in stage 22.0 (TID 126). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 16.0 in stage 22.0 (TID 140) (soumils-mbp, executor driver, partition 16, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Finished task 5.0 in stage 22.0 (TID 129). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Running task 16.0 in stage 22.0 (TID 140)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 8.0 in stage 22.0 (TID 132) in 58 ms on soumils-mbp (executor driver) (4/56)
24/03/01 11:21:07 INFO Executor: Finished task 10.0 in stage 22.0 (TID 134). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 6.0 in stage 22.0 (TID 130). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Finished task 7.0 in stage 22.0 (TID 131) in 58 ms on soumils-mbp (executor driver) (5/56)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 17.0 in stage 22.0 (TID 141) (soumils-mbp, executor driver, partition 17, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 18.0 in stage 22.0 (TID 142) (soumils-mbp, executor driver, partition 18, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 17.0 in stage 22.0 (TID 141)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 11.0 in stage 22.0 (TID 135) in 59 ms on soumils-mbp (executor driver) (6/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 1.0 in stage 22.0 (TID 125) in 60 ms on soumils-mbp (executor driver) (7/56)
24/03/01 11:21:07 INFO Executor: Running task 18.0 in stage 22.0 (TID 142)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 19.0 in stage 22.0 (TID 143) (soumils-mbp, executor driver, partition 19, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 19.0 in stage 22.0 (TID 143)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 20.0 in stage 22.0 (TID 144) (soumils-mbp, executor driver, partition 20, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 21.0 in stage 22.0 (TID 145) (soumils-mbp, executor driver, partition 21, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 20.0 in stage 22.0 (TID 144)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 22.0 in stage 22.0 (TID 146) (soumils-mbp, executor driver, partition 22, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 21.0 in stage 22.0 (TID 145)
24/03/01 11:21:07 INFO Executor: Running task 22.0 in stage 22.0 (TID 146)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 23.0 in stage 22.0 (TID 147) (soumils-mbp, executor driver, partition 23, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 0.0 in stage 22.0 (TID 124) in 63 ms on soumils-mbp (executor driver) (8/56)
24/03/01 11:21:07 INFO Executor: Running task 23.0 in stage 22.0 (TID 147)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 6.0 in stage 22.0 (TID 130) in 62 ms on soumils-mbp (executor driver) (9/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 5.0 in stage 22.0 (TID 129) in 63 ms on soumils-mbp (executor driver) (10/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 10.0 in stage 22.0 (TID 134) in 63 ms on soumils-mbp (executor driver) (11/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 2.0 in stage 22.0 (TID 126) in 64 ms on soumils-mbp (executor driver) (12/56)
24/03/01 11:21:07 INFO Executor: Finished task 14.0 in stage 22.0 (TID 138). 889 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 12.0 in stage 22.0 (TID 136). 846 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 24.0 in stage 22.0 (TID 148) (soumils-mbp, executor driver, partition 24, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_20_piece0 on soumils-mbp:49385 in memory (size: 158.4 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO Executor: Running task 24.0 in stage 22.0 (TID 148)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 25.0 in stage 22.0 (TID 149) (soumils-mbp, executor driver, partition 25, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 14.0 in stage 22.0 (TID 138) in 44 ms on soumils-mbp (executor driver) (13/56)
24/03/01 11:21:07 INFO Executor: Running task 25.0 in stage 22.0 (TID 149)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 12.0 in stage 22.0 (TID 136) in 48 ms on soumils-mbp (executor driver) (14/56)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_19_piece0 on soumils-mbp:49385 in memory (size: 12.1 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO Executor: Finished task 16.0 in stage 22.0 (TID 140). 846 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 26.0 in stage 22.0 (TID 150) (soumils-mbp, executor driver, partition 26, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 16.0 in stage 22.0 (TID 140) in 47 ms on soumils-mbp (executor driver) (15/56)
24/03/01 11:21:07 INFO Executor: Running task 26.0 in stage 22.0 (TID 150)
24/03/01 11:21:07 INFO Executor: Finished task 18.0 in stage 22.0 (TID 142). 846 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 27.0 in stage 22.0 (TID 151) (soumils-mbp, executor driver, partition 27, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 18.0 in stage 22.0 (TID 142) in 47 ms on soumils-mbp (executor driver) (16/56)
24/03/01 11:21:07 INFO Executor: Running task 27.0 in stage 22.0 (TID 151)
24/03/01 11:21:07 INFO Executor: Finished task 20.0 in stage 22.0 (TID 144). 846 bytes result sent to driver
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_21_piece0 on soumils-mbp:49385 in memory (size: 5.1 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 28.0 in stage 22.0 (TID 152) (soumils-mbp, executor driver, partition 28, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 28.0 in stage 22.0 (TID 152)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 20.0 in stage 22.0 (TID 144) in 47 ms on soumils-mbp (executor driver) (17/56)
24/03/01 11:21:07 INFO Executor: Finished task 15.0 in stage 22.0 (TID 139). 846 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 29.0 in stage 22.0 (TID 153) (soumils-mbp, executor driver, partition 29, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 15.0 in stage 22.0 (TID 139) in 57 ms on soumils-mbp (executor driver) (18/56)
24/03/01 11:21:07 INFO Executor: Finished task 13.0 in stage 22.0 (TID 137). 889 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 22.0 in stage 22.0 (TID 146). 846 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Running task 29.0 in stage 22.0 (TID 153)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 30.0 in stage 22.0 (TID 154) (soumils-mbp, executor driver, partition 30, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Starting task 31.0 in stage 22.0 (TID 155) (soumils-mbp, executor driver, partition 31, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 31.0 in stage 22.0 (TID 155)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 13.0 in stage 22.0 (TID 137) in 59 ms on soumils-mbp (executor driver) (19/56)
24/03/01 11:21:07 INFO Executor: Running task 30.0 in stage 22.0 (TID 154)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 22.0 in stage 22.0 (TID 146) in 53 ms on soumils-mbp (executor driver) (20/56)
24/03/01 11:21:07 INFO BlockManagerInfo: Removed broadcast_22_piece0 on soumils-mbp:49385 in memory (size: 3.2 KiB, free: 434.1 MiB)
24/03/01 11:21:07 INFO Executor: Finished task 19.0 in stage 22.0 (TID 143). 846 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 32.0 in stage 22.0 (TID 156) (soumils-mbp, executor driver, partition 32, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 19.0 in stage 22.0 (TID 143) in 61 ms on soumils-mbp (executor driver) (21/56)
24/03/01 11:21:07 INFO Executor: Running task 32.0 in stage 22.0 (TID 156)
24/03/01 11:21:07 INFO Executor: Finished task 23.0 in stage 22.0 (TID 147). 846 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 17.0 in stage 22.0 (TID 141). 846 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 33.0 in stage 22.0 (TID 157) (soumils-mbp, executor driver, partition 33, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 23.0 in stage 22.0 (TID 147) in 67 ms on soumils-mbp (executor driver) (22/56)
24/03/01 11:21:07 INFO Executor: Running task 33.0 in stage 22.0 (TID 157)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 34.0 in stage 22.0 (TID 158) (soumils-mbp, executor driver, partition 34, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 17.0 in stage 22.0 (TID 141) in 70 ms on soumils-mbp (executor driver) (23/56)
24/03/01 11:21:07 INFO Executor: Running task 34.0 in stage 22.0 (TID 158)
24/03/01 11:21:07 INFO Executor: Finished task 24.0 in stage 22.0 (TID 148). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 21.0 in stage 22.0 (TID 145). 846 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 35.0 in stage 22.0 (TID 159) (soumils-mbp, executor driver, partition 35, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 35.0 in stage 22.0 (TID 159)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 36.0 in stage 22.0 (TID 160) (soumils-mbp, executor driver, partition 36, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 36.0 in stage 22.0 (TID 160)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 24.0 in stage 22.0 (TID 148) in 38 ms on soumils-mbp (executor driver) (24/56)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 21.0 in stage 22.0 (TID 145) in 74 ms on soumils-mbp (executor driver) (25/56)
24/03/01 11:21:07 INFO Executor: Finished task 25.0 in stage 22.0 (TID 149). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 37.0 in stage 22.0 (TID 161) (soumils-mbp, executor driver, partition 37, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 25.0 in stage 22.0 (TID 149) in 46 ms on soumils-mbp (executor driver) (26/56)
24/03/01 11:21:07 INFO Executor: Running task 37.0 in stage 22.0 (TID 161)
24/03/01 11:21:07 INFO Executor: Finished task 26.0 in stage 22.0 (TID 150). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 38.0 in stage 22.0 (TID 162) (soumils-mbp, executor driver, partition 38, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 38.0 in stage 22.0 (TID 162)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 26.0 in stage 22.0 (TID 150) in 47 ms on soumils-mbp (executor driver) (27/56)
24/03/01 11:21:07 INFO Executor: Finished task 28.0 in stage 22.0 (TID 152). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 39.0 in stage 22.0 (TID 163) (soumils-mbp, executor driver, partition 39, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 28.0 in stage 22.0 (TID 152) in 47 ms on soumils-mbp (executor driver) (28/56)
24/03/01 11:21:07 INFO Executor: Running task 39.0 in stage 22.0 (TID 163)
24/03/01 11:21:07 INFO Executor: Finished task 27.0 in stage 22.0 (TID 151). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 40.0 in stage 22.0 (TID 164) (soumils-mbp, executor driver, partition 40, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 40.0 in stage 22.0 (TID 164)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 27.0 in stage 22.0 (TID 151) in 50 ms on soumils-mbp (executor driver) (29/56)
24/03/01 11:21:07 INFO Executor: Finished task 29.0 in stage 22.0 (TID 153). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 41.0 in stage 22.0 (TID 165) (soumils-mbp, executor driver, partition 41, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 29.0 in stage 22.0 (TID 153) in 45 ms on soumils-mbp (executor driver) (30/56)
24/03/01 11:21:07 INFO Executor: Running task 41.0 in stage 22.0 (TID 165)
24/03/01 11:21:07 INFO Executor: Finished task 30.0 in stage 22.0 (TID 154). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 42.0 in stage 22.0 (TID 166) (soumils-mbp, executor driver, partition 42, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 42.0 in stage 22.0 (TID 166)
24/03/01 11:21:07 INFO TaskSetManager: Finished task 30.0 in stage 22.0 (TID 154) in 45 ms on soumils-mbp (executor driver) (31/56)
24/03/01 11:21:07 INFO Executor: Finished task 31.0 in stage 22.0 (TID 155). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 43.0 in stage 22.0 (TID 167) (soumils-mbp, executor driver, partition 43, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 31.0 in stage 22.0 (TID 155) in 49 ms on soumils-mbp (executor driver) (32/56)
24/03/01 11:21:07 INFO Executor: Running task 43.0 in stage 22.0 (TID 167)
24/03/01 11:21:07 INFO Executor: Finished task 32.0 in stage 22.0 (TID 156). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 44.0 in stage 22.0 (TID 168) (soumils-mbp, executor driver, partition 44, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 32.0 in stage 22.0 (TID 156) in 45 ms on soumils-mbp (executor driver) (33/56)
24/03/01 11:21:07 INFO Executor: Running task 44.0 in stage 22.0 (TID 168)
24/03/01 11:21:07 INFO Executor: Finished task 34.0 in stage 22.0 (TID 158). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 45.0 in stage 22.0 (TID 169) (soumils-mbp, executor driver, partition 45, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 34.0 in stage 22.0 (TID 158) in 45 ms on soumils-mbp (executor driver) (34/56)
24/03/01 11:21:07 INFO Executor: Running task 45.0 in stage 22.0 (TID 169)
24/03/01 11:21:07 INFO Executor: Finished task 33.0 in stage 22.0 (TID 157). 803 bytes result sent to driver
24/03/01 11:21:07 INFO Executor: Finished task 36.0 in stage 22.0 (TID 160). 803 bytes result sent to driver
24/03/01 11:21:07 INFO TaskSetManager: Starting task 46.0 in stage 22.0 (TID 170) (soumils-mbp, executor driver, partition 46, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO Executor: Running task 46.0 in stage 22.0 (TID 170)
24/03/01 11:21:07 INFO TaskSetManager: Starting task 47.0 in stage 22.0 (TID 171) (soumils-mbp, executor driver, partition 47, PROCESS_LOCAL, 7254 bytes) 
24/03/01 11:21:07 INFO TaskSetManager: Finished task 33.0 in stage 22.0 (TID 157) in 51 ms on soumils-mbp (executor driver) (35/56)
24/03/01 11:21:07 INFO Executor: Running task 47.0 in stage 22.0 (TID 171)
24/03/01 

image

I do not see delta log files

I can try with --enable-sync isn't that for hive sync @the-other-tim-brown ?

soumilshah1995 commented 6 months ago

Here is Error after adding --enable-sync \

FUll Command

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --exclude-packages 'org.slf4j:slf4j-api,org.apache.logging.log4j:log4j-core,org.apache.logging.log4j:log4j-slf4j-impl' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars '/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-extensions-0.1.0-beta1.jar' \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --sync-tool-classes io.onetable.hudi.sync.OneTableSyncTool \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats=DELTA' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168'

24/03/01 11:28:00 INFO HoodieActiveTimeline: Loaded instants upto : Option{val=[20240301112751750__deltacommit__COMPLETED__20240301112759850]}
24/03/01 11:28:00 INFO AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
24/03/01 11:28:00 INFO ClusteringUtils: Found 0 files in pending clustering operations
24/03/01 11:28:00 INFO StreamSync: Commit 20240301112751750 successful!
24/03/01 11:28:00 WARN StreamSync: SyncTool class io.onetable.hudi.sync.OneTableSyncTool failed with exception
org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class io.onetable.hudi.sync.OneTableSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
    at org.apache.hudi.utilities.streamer.StreamSync.runMetaSync(StreamSync.java:938)
    at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:851)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: io/onetable/client/SourceClientProvider
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:315)
    at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:55)
    at org.apache.hudi.common.util.ReflectionUtils.hasConstructor(ReflectionUtils.java:111)
    at org.apache.hudi.common.util.ReflectionUtils.hasConstructor(ReflectionUtils.java:93)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:106)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:78)
    ... 20 more
Caused by: java.lang.ClassNotFoundException: io.onetable.client.SourceClientProvider
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 27 more
24/03/01 11:28:00 INFO StreamSync: Shutting down embedded timeline server
24/03/01 11:28:00 INFO EmbeddedTimelineService: Closing Timeline server
24/03/01 11:28:00 INFO TimelineService: Closing Timeline Service
24/03/01 11:28:00 INFO Javalin: Stopping Javalin ...
24/03/01 11:28:00 INFO Javalin: Javalin has stopped
24/03/01 11:28:00 INFO TimelineService: Closed Timeline Service
24/03/01 11:28:00 INFO EmbeddedTimelineService: Closed Timeline server
24/03/01 11:28:00 INFO SparkContext: SparkContext is stopping with exitCode 0.
24/03/01 11:28:00 INFO SparkUI: Stopped Spark web UI at http://soumils-mbp:8090
24/03/01 11:28:00 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/03/01 11:28:00 INFO MemoryStore: MemoryStore cleared
24/03/01 11:28:00 INFO BlockManager: BlockManager stopped
24/03/01 11:28:00 INFO BlockManagerMaster: BlockManagerMaster stopped
24/03/01 11:28:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/03/01 11:28:00 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class io.onetable.hudi.sync.OneTableSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
    at org.apache.hudi.utilities.streamer.StreamSync.runMetaSync(StreamSync.java:938)
    at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:851)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: io/onetable/client/SourceClientProvider
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:315)
    at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:55)
    at org.apache.hudi.common.util.ReflectionUtils.hasConstructor(ReflectionUtils.java:111)
    at org.apache.hudi.common.util.ReflectionUtils.hasConstructor(ReflectionUtils.java:93)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:106)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:78)
    ... 20 more
Caused by: java.lang.ClassNotFoundException: io.onetable.client.SourceClientProvider
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 27 more
24/03/01 11:28:00 INFO ShutdownHookManager: Shutdown hook called
24/03/01 11:28:00 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-ec817afa-a4ff-45ed-986f-d1bda7d2e071
24/03/01 11:28:00 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-a3c51739-dd7b-43d4-9699-cb8eb7786835
soumilshah@Soumils-MBP DeltaStreamer % 
soumilshah1995 commented 6 months ago

well after lot of headache I got the java working and got the man and build running

RNING] Usually this is not harmful and you can skip these warnings,
[WARNING] otherwise try to manually exclude artifacts based on
[WARNING] mvn dependency:tree -Ddetail=true and the above output.
[WARNING] See https://maven.apache.org/plugins/maven-shade-plugin/
[INFO] Attaching shaded artifact.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for onetable 0.1.0-SNAPSHOT:
[INFO] 
[INFO] onetable ........................................... SUCCESS [  0.208 s]
[INFO] api ................................................ SUCCESS [  2.081 s]
[INFO] hudi-support ....................................... SUCCESS [  0.003 s]
[INFO] hudi-utils ......................................... SUCCESS [  0.712 s]
[INFO] core ............................................... SUCCESS [  2.701 s]
[INFO] utilities .......................................... SUCCESS [01:13 min]
[INFO] hudi-extensions .................................... SUCCESS [ 12.657 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:32 min
[INFO] Finished at: 2024-03-01T19:25:34-05:00
[INFO] ------------------------------------------------------------------------
soumilshah@Soumils-MBP inc

let me try with these jar fingers crossed

soumilshah1995 commented 6 months ago

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars '/Users/soumilshah/Desktop/oneTable/tem/incubator-xtable/hudi-support/extensions/target/hudi-extensions-0.1.0-SNAPSHOT-bundled.jar' \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --sync-tool-classes io.onetable.hudi.sync.OneTableSyncTool \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats=DELTA' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168'

Error

4/03/01 19:35:35 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class io.onetable.hudi.sync.OneTableSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
    at org.apache.hudi.utilities.streamer.StreamSync.runMetaSync(StreamSync.java:938)
    at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:851)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
    at io.onetable.hudi.sync.OneTableSyncTool.syncHoodieTable(OneTableSyncTool.java:61)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79)
    ... 20 more
24/03/01 19:35:35 INFO ShutdownHookManager: Shutdown hook called
24/03/01 19:35:35 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-948a372b-7ebc-4a5b-b833-459bca428f7e
24/03/01 19:35:35 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-3cd9b10e-5f0b-4a34-a78f-1d734ca8c6e5
soumilshah@Soumils-MBP DeltaStreamer % 
the-other-tim-brown commented 6 months ago

@soumilshah1995 change hoodie.onetable.formats=DELTA to hoodie.onetable.formats.to.sync=DELTA. The docs are not matching an update to the parameters in the code.

Filing this to track https://github.com/apache/incubator-xtable/issues/365

soumilshah1995 commented 6 months ago

@the-other-tim-brown Still getting Error

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars '/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-extensions-0.1.0-SNAPSHOT-bundled.jar' \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --sync-tool-classes 'io.onetable.hudi.sync.OneTableSyncTool' \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats.to.sync=DELTA'

Error

4/03/02 15:47:34 INFO FileSystemViewManager: Creating in-memory based Table View
24/03/02 15:47:34 WARN StreamSync: SyncTool class io.onetable.hudi.sync.OneTableSyncTool failed with exception
org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class io.onetable.hudi.sync.OneTableSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
    at org.apache.hudi.utilities.streamer.StreamSync.runMetaSync(StreamSync.java:938)
    at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:851)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.ServiceConfigurationError: io.onetable.spi.sync.TargetClient: io.onetable.hudi.HudiTargetClient Unable to get public no-arg constructor
    at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:582)
    at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:673)
    at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1233)
    at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
    at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
    at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
    at io.onetable.client.TableFormatClientFactory.createTargetClientForName(TableFormatClientFactory.java:67)
    at io.onetable.client.TableFormatClientFactory.createForFormat(TableFormatClientFactory.java:51)
    at io.onetable.client.OneTableClient.lambda$sync$0(OneTableClient.java:99)
    at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:178)
    at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    at io.onetable.client.OneTableClient.sync(OneTableClient.java:95)
    at io.onetable.hudi.sync.OneTableSyncTool.syncHoodieTable(OneTableSyncTool.java:80)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79)
    ... 20 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hudi/client/common/HoodieJavaEngineContext
    at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
    at java.base/java.lang.Class.getConstructor0(Class.java:3342)
    at java.base/java.lang.Class.getConstructor(Class.java:2151)
    at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:660)
    at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:657)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:668)
    ... 38 more
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.client.common.HoodieJavaEngineContext
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 46 more
24/03/02 15:47:34 INFO StreamSync: Shutting down embedded timeline server
24/03/02 15:47:34 INFO EmbeddedTimelineService: Closing Timeline server
24/03/02 15:47:34 INFO TimelineService: Closing Timeline Service
24/03/02 15:47:34 INFO Javalin: Stopping Javalin ...
24/03/02 15:47:34 INFO Javalin: Javalin has stopped
24/03/02 15:47:34 INFO TimelineService: Closed Timeline Service
24/03/02 15:47:34 INFO EmbeddedTimelineService: Closed Timeline server
24/03/02 15:47:34 INFO SparkContext: SparkContext is stopping with exitCode 0.
24/03/02 15:47:34 INFO SparkUI: Stopped Spark web UI at http://soumils-mbp:8090
24/03/02 15:47:34 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/03/02 15:47:34 INFO MemoryStore: MemoryStore cleared
24/03/02 15:47:34 INFO BlockManager: BlockManager stopped
24/03/02 15:47:34 INFO BlockManagerMaster: BlockManagerMaster stopped
24/03/02 15:47:34 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/03/02 15:47:34 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class io.onetable.hudi.sync.OneTableSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
    at org.apache.hudi.utilities.streamer.StreamSync.runMetaSync(StreamSync.java:938)
    at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:851)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.util.ServiceConfigurationError: io.onetable.spi.sync.TargetClient: io.onetable.hudi.HudiTargetClient Unable to get public no-arg constructor
    at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:582)
    at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:673)
    at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1233)
    at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
    at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
    at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
    at io.onetable.client.TableFormatClientFactory.createTargetClientForName(TableFormatClientFactory.java:67)
    at io.onetable.client.TableFormatClientFactory.createForFormat(TableFormatClientFactory.java:51)
    at io.onetable.client.OneTableClient.lambda$sync$0(OneTableClient.java:99)
    at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:178)
    at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    at io.onetable.client.OneTableClient.sync(OneTableClient.java:95)
    at io.onetable.hudi.sync.OneTableSyncTool.syncHoodieTable(OneTableSyncTool.java:80)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79)
    ... 20 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hudi/client/common/HoodieJavaEngineContext
    at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
    at java.base/java.lang.Class.getConstructor0(Class.java:3342)
    at java.base/java.lang.Class.getConstructor(Class.java:2151)
    at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:660)
    at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:657)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:668)
    ... 38 more
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.client.common.HoodieJavaEngineContext
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 46 more
24/03/02 15:47:34 INFO ShutdownHookManager: Shutdown hook called
24/03/02 15:47:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-4590b9e3-daae-4317-9782-c7126533de25
24/03/02 15:47:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-ea3fcde5-0a6c-46e7-aad7-264fd4c4c8f6
soumilshah@Soumils-MBP DeltaStreamer % 

would it be best for quick call to resolve it ?

soumilshah1995 commented 6 months ago

Still facing same error on delta streamer

the-other-tim-brown commented 6 months ago

The issue is that the hudi-java-client is listed as provided in the pom so it is not added to the extensions bundle we're making here in onetable. To get around this issue, you can try adding the jar to the path or rebuilding from this branch: https://github.com/apache/incubator-xtable/pull/367

soumilshah1995 commented 6 months ago

@the-other-tim-brown
As per your feedback 1 adding jar

Approach 1 (Failed)

    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars '/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-extensions-0.1.0-beta1.jar,/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-java-client-0.14.0.jar' \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --sync-tool-classes 'io.onetable.hudi.sync.OneTableSyncTool' \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats.to.sync=DELTA' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168'

I did add /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-java-client-0.14.0.jar its throwing same error


4/03/03 08:53:36 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/03/03 08:53:36 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class io.onetable.hudi.sync.OneTableSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
    at org.apache.hudi.utilities.streamer.StreamSync.runMetaSync(StreamSync.java:938)
    at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:851)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: io/onetable/client/SourceClientProvider
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:315)
    at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:55)
    at org.apache.hudi.common.util.ReflectionUtils.hasConstructor(ReflectionUtils.java:111)
    at org.apache.hudi.common.util.ReflectionUtils.hasConstructor(ReflectionUtils.java:93)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:106)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:78)
    ... 20 more
Caused by: java.lang.ClassNotFoundException: io.onetable.client.SourceClientProvider
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 27 more
24/03/03 08:53:36 INFO ShutdownHookManager: Shutdown hook called
24/03/03 08:53:36 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-2a254b1a-6aa4-462e-a576-e18dbb22bd8f
24/03/03 08:53:36 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-b60a5a20-0a55-4b29-a919-d1774bf40f71
soumilshah@Soumils-MBP DeltaStreamer % 

Here is second approach

Approach 2

(venv) soumilshah@Soumils-MBP incubator-xtable % git branch                                
* 351-add-client-jars-to-bundle
  main
(venv) soumilshah@Soumils-MBP incubator-xtable % 

Build Jar


[WARNING]   - META-INF/maven/org.eclipse.jetty/jetty-http/pom.properties
[WARNING]   - META-INF/maven/org.eclipse.jetty/jetty-http/pom.xml
[WARNING] maven-shade-plugin has detected that some class files are
[WARNING] present in two or more JARs. When this happens, only one
[WARNING] single version of the class is copied to the uber jar.
[WARNING] Usually this is not harmful and you can skip these warnings,
[WARNING] otherwise try to manually exclude artifacts based on
[WARNING] mvn dependency:tree -Ddetail=true and the above output.
[WARNING] See https://maven.apache.org/plugins/maven-shade-plugin/
[INFO] Attaching shaded artifact.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for onetable 0.1.0-SNAPSHOT:
[INFO] 
[INFO] onetable ........................................... SUCCESS [  0.192 s]
[INFO] api ................................................ SUCCESS [  1.627 s]
[INFO] hudi-support ....................................... SUCCESS [  0.002 s]
[INFO] hudi-utils ......................................... SUCCESS [  0.659 s]
[INFO] core ............................................... SUCCESS [  2.514 s]
[INFO] utilities .......................................... SUCCESS [ 45.646 s]
[INFO] hudi-extensions .................................... SUCCESS [ 12.791 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:03 min
[INFO] Finished at: 2024-03-03T09:01:04-05:00
[INFO] ------------------------------------------------------------------------
(venv) soumilshah@Soumils-MBP incubator-xtable % 

New Jar Spark Submit


spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars '/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/new_jars/hudi-extensions-0.1.0-SNAPSHOT.jar' \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --sync-tool-classes 'io.onetable.hudi.sync.OneTableSyncTool' \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats.to.sync=DELTA' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168'

Error

03/03 09:03:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/03/03 09:03:12 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class io.onetable.hudi.sync.OneTableSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
    at org.apache.hudi.utilities.streamer.StreamSync.runMetaSync(StreamSync.java:938)
    at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:851)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: io/onetable/client/SourceClientProvider
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:315)
    at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:55)
    at org.apache.hudi.common.util.ReflectionUtils.hasConstructor(ReflectionUtils.java:111)
    at org.apache.hudi.common.util.ReflectionUtils.hasConstructor(ReflectionUtils.java:93)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:106)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:78)
    ... 20 more
Caused by: java.lang.ClassNotFoundException: io.onetable.client.SourceClientProvider
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 27 more
24/03/03 09:03:12 INFO ShutdownHookManager: Shutdown hook called
24/03/03 09:03:12 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-3215b36d-219b-4a88-9223-0be0c3eafe0f
24/03/03 09:03:12 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-21e211d4-437a-4f83-abf6-8472ad7984da
soumilshah@Soumils-MBP DeltaStreamer % 
soumilshah1995 commented 6 months ago

I added this two config now its throwing different error

    --hoodie-conf 'hoodie.datasource.write.row.writer.enable=false' \
    --hoodie-conf 'hoodie.avro.write.support.class=io.onetable.hudi.extensions.HoodieAvroWriteSupportWithFieldIds' \
    --hoodie-conf 'hoodie.client.init.callback.classes=io.onetable.hudi.extensions.AddFieldIdsClientInitCallback'

Spark Submit

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars '/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/new_jars/hudi-extensions-0.1.0-SNAPSHOT.jar' \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --sync-tool-classes 'io.onetable.hudi.sync.OneTableSyncTool' \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats.to.sync=DELTA' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168' \
    --hoodie-conf 'hoodie.datasource.write.row.writer.enable=false' \
    --hoodie-conf 'hoodie.avro.write.support.class=io.onetable.hudi.extensions.HoodieAvroWriteSupportWithFieldIds' \
    --hoodie-conf 'hoodie.client.init.callback.classes=io.onetable.hudi.extensions.AddFieldIdsClientInitCallback'

Error

03 09:22:29 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/03/03 09:22:29 INFO MemoryStore: MemoryStore cleared
24/03/03 09:22:29 INFO BlockManager: BlockManager stopped
24/03/03 09:22:29 INFO BlockManagerMaster: BlockManagerMaster stopped
24/03/03 09:22:29 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/03/03 09:22:29 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.lang.NoClassDefFoundError: io/onetable/hudi/idtracking/IdTracker
    at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
    at java.base/java.lang.Class.getConstructor0(Class.java:3342)
    at java.base/java.lang.Class.newInstance(Class.java:556)
    at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:68)
    at org.apache.hudi.client.BaseHoodieClient.lambda$runClientInitCallbacks$0(BaseHoodieClient.java:152)
    at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
    at org.apache.hudi.client.BaseHoodieClient.runClientInitCallbacks(BaseHoodieClient.java:151)
    at org.apache.hudi.client.BaseHoodieClient.<init>(BaseHoodieClient.java:100)
    at org.apache.hudi.client.BaseHoodieWriteClient.<init>(BaseHoodieWriteClient.java:163)
    at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:86)
    at org.apache.hudi.utilities.streamer.StreamSync.reInitWriteClient(StreamSync.java:988)
    at org.apache.hudi.utilities.streamer.StreamSync.setupWriteClient(StreamSync.java:961)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:414)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: io.onetable.hudi.idtracking.IdTracker
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 32 more
24/03/03 09:22:34 INFO ShutdownHookManager: Shutdown hook called
24/03/03 09:22:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-01d5445b-7bc1-49ad-aa5e-35f565e64adb
24/03/03 09:22:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-a5ebc47a-ea50-4c0a-a1ed-066e2761e6b5
soumilshah@Soumils-MBP DeltaStreamer % 
soumilshah1995 commented 6 months ago

let me know if needed I can show you steps on call as well if needed

the-other-tim-brown commented 6 months ago

In all of the recent cases it looks like you are not using the bundled jar for the hudi-extensions module. The bundled jars contain the required dependencies for the code to execute. These dependencies are not included in the basic jars like hudi-extensions-0.1.0-SNAPSHOT.jar so that is why you are seeing all of these ClassNotFoundExceptions.

Using hudi-extensions-0.1.0-SNAPSHOT-bundled.jar should get pass these issues of the dependencies not being found on the classpath.

soumilshah1995 commented 6 months ago

I did use that in this example im having same issue

Spark Submit

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/new_jars/hudi-extensions-0.1.0-SNAPSHOT-bundled.jar,/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-java-client-0.14.0.jar \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --sync-tool-classes 'io.onetable.hudi.sync.OneTableSyncTool' \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats.to.sync=DELTA' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168' \
    --hoodie-conf 'hoodie.datasource.write.row.writer.enable=false' \
    --hoodie-conf 'hoodie.avro.write.support.class=io.onetable.hudi.extensions.HoodieAvroWriteSupportWithFieldIds' \
    --hoodie-conf 'hoodie.client.init.callback.classes=io.onetable.hudi.extensions.AddFieldIdsClientInitCallback'

Error

24/03/04 08:32:04 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class io.onetable.hudi.sync.OneTableSyncTool
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:81)
    at org.apache.hudi.utilities.streamer.StreamSync.runMetaSync(StreamSync.java:938)
    at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:851)
    at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
    at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:840)
    at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:205)
    at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:584)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NumberFormatException
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:603)
    at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:678)
    at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:737)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:919)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    at io.onetable.model.storage.OneFileGroup.fromFiles(OneFileGroup.java:44)
    at io.onetable.hudi.HudiDataFileExtractor.getOneDataFilesForPartitions(HudiDataFileExtractor.java:362)
    at io.onetable.hudi.HudiDataFileExtractor.getFilesCurrentState(HudiDataFileExtractor.java:119)
    at io.onetable.hudi.HudiClient.getCurrentSnapshot(HudiClient.java:104)
    at io.onetable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:35)
    at io.onetable.client.OneTableClient.syncSnapshot(OneTableClient.java:177)
    at io.onetable.client.OneTableClient.sync(OneTableClient.java:116)
    at io.onetable.hudi.sync.OneTableSyncTool.syncHoodieTable(OneTableSyncTool.java:80)
    at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79)
    ... 20 more
Caused by: java.lang.NumberFormatException: For input string: "2024-02-28"
    at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.base/java.lang.Integer.parseInt(Integer.java:652)
    at java.base/java.lang.Integer.parseInt(Integer.java:770)
    at io.onetable.hudi.HudiPartitionValuesExtractor.parseValue(HudiPartitionValuesExtractor.java:144)
    at io.onetable.hudi.HudiPartitionValuesExtractor.parsePartitionPath(HudiPartitionValuesExtractor.java:98)
    at io.onetable.hudi.HudiPartitionValuesExtractor.extractPartitionValues(HudiPartitionValuesExtractor.java:71)
    at io.onetable.hudi.HudiDataFileExtractor.lambda$getOneDataFilesForPartitions$12(HudiDataFileExtractor.java:354)
    at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:271)
    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:952)
    at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:926)
    at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
    at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
24/03/04 08:32:04 INFO ShutdownHookManager: Shutdown hook called
24/03/04 08:32:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-b4d1a856-1a05-4ba3-862d-6adc08ffb9d1
24/03/04 08:32:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/qq/s_1bjv516pn_mck29cwdwxnm0000gp/T/spark-15a6b978-8eff-4a1e-bd15-ec3104212297
soumilshah@Soumils-MBP DeltaStreamer % 
soumilshah1995 commented 6 months ago

I wanted to reach out regarding the issue I've been encountering despite trying all the approaches you previously mentioned. It seems I'm still facing some challenges.

Would it be possible for us to schedule a brief meeting, perhaps around 10 minutes of your time, to discuss and hopefully resolve this matter? I believe a direct conversation could help clarify things and lead to a solution.

Furthermore, resolving this issue is not only beneficial to me but also to the broader data engineering community. I intend to share my experience with others, particularly on how to correctly utilize Onetable. Given that Onetable is relatively new and documentation may be sparse, spreading awareness on its usage could greatly assist others.

Hence, I kindly request that you allocate a few moments of your schedule to address this matter together. Your assistance in this regard would be immensely appreciated.

Looking forward to your response and hopefully a quick resolution.

soumilshah1995 commented 6 months ago

Update just spoke with sagar on call adding exercise files for references https://github.com/soumilshah1995/apache-hudi-delta-streamer-labs/tree/main/E2

soumilshah1995 commented 6 months ago

@sagarlakshmipathy it looks like code worked finally sagar told me remove partition path and after removing partition path it works

im curious to know why it does not work with partition path ? --hoodie-conf 'hoodie.datasource.write.partitionpath.field=order_date' \

Worked

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/new_jars/hudi-extensions-0.1.0-SNAPSHOT-bundled.jar,/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-java-client-0.14.0.jar \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 'file:///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/hudi/bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --sync-tool-classes 'io.onetable.hudi.sync.OneTableSyncTool' \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats.to.sync=DELTA' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168'

image

soumilshah1995 commented 6 months ago

Screenshot 2024-03-04 at 7 47 52 PM

Videos coming soon

the-other-tim-brown commented 6 months ago

Caused by: java.lang.NumberFormatException: For input string: "2024-02-28" -> This is a bug in the sync tool. I will take a look to see how we can make the tool more robust for timestamp partitions

soumilshah1995 commented 6 months ago

Roger that Video and Guide for Onetable and DeltaStreamer has been uploaded on https://www.youtube.com/watch?v=9yRVoq-swH8&t=22s

soumilshah1995 commented 6 months ago

do you want me to close this ticket I think task is complete let me know