dataunitylab / jsonoid-discovery

Distributed JSON schema discovery
https://dataunitylab.github.io/jsonoid-discovery/
MIT License
15 stars 3 forks source link

Submission to spark-submit with a sample json failing #40

Closed calvin-dani closed 2 weeks ago

calvin-dani commented 3 weeks ago

Version of spark and scala

WARNING: Using incubator modules: jdk.incubator.vector Welcome to


 / __/__  ___ _____/ /__
_\ \/ _ \/ _ `/ __/  '_/

// ./_,// //_\ version 4.0.0-SNAPSHOT /_/

Using Scala version 2.13.14, OpenJDK 64-Bit Server VM, 17.0.12 Branch master


Jsonoid jar downloaded from https://github.com/dataunitylab/jsonoid-discovery/releases/tag/v0.30.1


Sample input

{"message":"docs: Document repeat(exp)","name":"Nico Williams"}


Terminal Log after running ( spark-submit --master local --class io.github.dataunitylab.jsonoid.discovery.spark.JsonoidSpark ./jsonoid-discovery-0.30.1.jar "./sample.json" -p "Min" )

"ts":"2024-08-21T16:36:01.954Z","level":"WARN","msg":"Set SPARK_LOCAL_IP if you need to bind to another address","logger":"Utils"} {"ts":"2024-08-21T16:36:02.279Z","level":"INFO","msg":"Running Spark version 4.0.0-SNAPSHOT","context":{"spark_version":"4.0.0-SNAPSHOT"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:02.280Z","level":"INFO","msg":"OS info Mac OS X, 13.5.2, aarch64","context":{"os_arch":"aarch64","os_name":"Mac OS X","os_version":"13.5.2"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:02.280Z","level":"INFO","msg":"Java version 17.0.12","context":{"java_version":"17.0.12"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:02.333Z","level":"WARN","msg":"Unable to load native-hadoop library for your platform... using builtin-java classes where applicable","logger":"NativeCodeLoader"} {"ts":"2024-08-21T16:36:02.385Z","level":"INFO","msg":"==============================================================","logger":"ResourceUtils"} {"ts":"2024-08-21T16:36:02.385Z","level":"INFO","msg":"No custom resources configured for spark.driver.","logger":"ResourceUtils"} {"ts":"2024-08-21T16:36:02.386Z","level":"INFO","msg":"==============================================================","logger":"ResourceUtils"} {"ts":"2024-08-21T16:36:02.386Z","level":"INFO","msg":"Submitted application: JSONoid","context":{"app_name":"JSONoid"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:02.398Z","level":"INFO","msg":"Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)","context":{"executor_resources":"Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: )","task_resources":"Map(cpus -> name: cpus, amount: 1.0)"},"logger":"ResourceProfile"} {"ts":"2024-08-21T16:36:02.399Z","level":"INFO","msg":"Limiting resource is cpu","context":{"resource":"cpu"},"logger":"ResourceProfile"} {"ts":"2024-08-21T16:36:02.400Z","level":"INFO","msg":"Added ResourceProfile id: 0","context":{"resource_profile_id":"0"},"logger":"ResourceProfileManager"} {"ts":"2024-08-21T16:36:02.426Z","level":"INFO","msg":"Changing view acls to: calvindani","context":{"view_acls":"calvindani"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.427Z","level":"INFO","msg":"Changing modify acls to: calvindani","context":{"modify_acls":"calvindani"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.427Z","level":"INFO","msg":"Changing view acls groups to: calvindani","context":{"view_acls":"calvindani"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.427Z","level":"INFO","msg":"Changing modify acls groups to: calvindani","context":{"modify_acls":"calvindani"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.428Z","level":"INFO","msg":"SecurityManager: authentication disabled; ui acls disabled; users with view permissions: calvindani groups with view permissions: EMPTY; users with modify permissions: calvindani; groups with modify permissions: EMPTY; RPC SSL disabled","context":{"auth_enabled":"disabled","modify_acls":"calvindani","modify_acls_groups":"EMPTY","rpc_ssl_enabled":"disabled","ui_acls":"disabled","view_acls":"calvindani","view_acls_groups":"EMPTY"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.595Z","level":"INFO","msg":"Successfully started service 'sparkDriver' on port 50991.","context":{"port":"50991","service_name":" 'sparkDriver'"},"logger":"Utils"} {"ts":"2024-08-21T16:36:02.608Z","level":"INFO","msg":"Registering MapOutputTracker","context":{"endpoint_name":"MapOutputTracker"},"logger":"SparkEnv"} {"ts":"2024-08-21T16:36:02.615Z","level":"INFO","msg":"Registering BlockManagerMaster","context":{"endpoint_name":"BlockManagerMaster"},"logger":"SparkEnv"} {"ts":"2024-08-21T16:36:02.623Z","level":"INFO","msg":"Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information","context":{"class_name":"org.apache.spark.storage.DefaultTopologyMapper"},"logger":"BlockManagerMasterEndpoint"} {"ts":"2024-08-21T16:36:02.624Z","level":"INFO","msg":"BlockManagerMasterEndpoint up","logger":"BlockManagerMasterEndpoint"} {"ts":"2024-08-21T16:36:02.625Z","level":"INFO","msg":"Registering BlockManagerMasterHeartbeat","context":{"endpoint_name":"BlockManagerMasterHeartbeat"},"logger":"SparkEnv"} {"ts":"2024-08-21T16:36:02.638Z","level":"INFO","msg":"Created local directory at /private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/blockmgr-4420c053-bb95-41f6-a9f2-ae3587890520","context":{"path":"/private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/blockmgr-4420c053-bb95-41f6-a9f2-ae3587890520"},"logger":"DiskBlockManager"} {"ts":"2024-08-21T16:36:02.648Z","level":"INFO","msg":"Registering OutputCommitCoordinator","context":{"endpoint_name":"OutputCommitCoordinator"},"logger":"SparkEnv"} {"ts":"2024-08-21T16:36:02.734Z","level":"INFO","msg":"Start Jetty 0.0.0.0:4040 for SparkUI","context":{"host":"0.0.0.0","port":"4040","server_name":"SparkUI"},"logger":"JettyUtils"} {"ts":"2024-08-21T16:36:02.761Z","level":"INFO","msg":"Successfully started service 'SparkUI' on port 4040.","context":{"port":"4040","service_name":" 'SparkUI'"},"logger":"Utils"} {"ts":"2024-08-21T16:36:02.780Z","level":"INFO","msg":"Added JAR file:/Users/calvindani/Documents/Spark-jobs/jsonoid-discovery-0.30.1.jar at spark://172.16.0.32:50991/jars/jsonoid-discovery-0.30.1.jar with timestamp 1724258162276","context":{"added_jars":"spark://172.16.0.32:50991/jars/jsonoid-discovery-0.30.1.jar","path":"file:/Users/calvindani/Documents/Spark-jobs/jsonoid-discovery-0.30.1.jar","timestamp":"1724258162276"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:02.791Z","level":"INFO","msg":"Changing view acls to: calvindani","context":{"view_acls":"calvindani"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.792Z","level":"INFO","msg":"Changing modify acls to: calvindani","context":{"modify_acls":"calvindani"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.792Z","level":"INFO","msg":"Changing view acls groups to: calvindani","context":{"view_acls":"calvindani"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.792Z","level":"INFO","msg":"Changing modify acls groups to: calvindani","context":{"modify_acls":"calvindani"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.792Z","level":"INFO","msg":"SecurityManager: authentication disabled; ui acls disabled; users with view permissions: calvindani groups with view permissions: EMPTY; users with modify permissions: calvindani; groups with modify permissions: EMPTY; RPC SSL disabled","context":{"auth_enabled":"disabled","modify_acls":"calvindani","modify_acls_groups":"EMPTY","rpc_ssl_enabled":"disabled","ui_acls":"disabled","view_acls":"calvindani","view_acls_groups":"EMPTY"},"logger":"SecurityManager"} {"ts":"2024-08-21T16:36:02.826Z","level":"INFO","msg":"Starting executor ID driver on host 172.16.0.32","context":{"executor_id":"driver","host":"172.16.0.32"},"logger":"Executor"} {"ts":"2024-08-21T16:36:02.826Z","level":"INFO","msg":"OS info Mac OS X, 13.5.2, aarch64","context":{"os_arch":"aarch64","os_name":"Mac OS X","os_version":"13.5.2"},"logger":"Executor"} {"ts":"2024-08-21T16:36:02.826Z","level":"INFO","msg":"Java version 17.0.12","context":{"java_version":"17.0.12"},"logger":"Executor"} {"ts":"2024-08-21T16:36:02.829Z","level":"INFO","msg":"Starting executor with user classpath (userClassPathFirst = false): ''","context":{"executor_user_class_path_first":"false","urls":"''"},"logger":"Executor"} {"ts":"2024-08-21T16:36:02.830Z","level":"INFO","msg":"Created or updated repl class loader org.apache.spark.util.MutableURLClassLoader@106b014e for default.","context":{"class_loader":"org.apache.spark.util.MutableURLClassLoader@106b014e","session_id":"default"},"logger":"Executor"} {"ts":"2024-08-21T16:36:02.834Z","level":"INFO","msg":"Fetching spark://172.16.0.32:50991/jars/jsonoid-discovery-0.30.1.jar with timestamp 1724258162276","context":{"jar_url":"spark://172.16.0.32:50991/jars/jsonoid-discovery-0.30.1.jar","timestamp":"1724258162276"},"logger":"Executor"} {"ts":"2024-08-21T16:36:02.860Z","level":"INFO","msg":"Successfully created connection to /172.16.0.32:50991 after 12 ms (0 ms spent in bootstraps)","context":{"bootstrap_time":"0","elapsed_time":"12","host_port":"/172.16.0.32:50991"},"logger":"TransportClientFactory"} {"ts":"2024-08-21T16:36:02.864Z","level":"INFO","msg":"Fetching spark://172.16.0.32:50991/jars/jsonoid-discovery-0.30.1.jar to /private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/spark-57f3b896-947a-4402-89f7-b051873e8644/userFiles-b7b0c6fd-1538-41aa-841d-c80ec2146c9e/fetchFileTemp7490985089544058883.tmp","context":{"file_absolute_path":"/private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/spark-57f3b896-947a-4402-89f7-b051873e8644/userFiles-b7b0c6fd-1538-41aa-841d-c80ec2146c9e/fetchFileTemp7490985089544058883.tmp","url":"spark://172.16.0.32:50991/jars/jsonoid-discovery-0.30.1.jar"},"logger":"Utils"} {"ts":"2024-08-21T16:36:03.024Z","level":"INFO","msg":"Adding file:/private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/spark-57f3b896-947a-4402-89f7-b051873e8644/userFiles-b7b0c6fd-1538-41aa-841d-c80ec2146c9e/jsonoid-discovery-0.30.1.jar to class loader default","context":{"url":"file:/private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/spark-57f3b896-947a-4402-89f7-b051873e8644/userFiles-b7b0c6fd-1538-41aa-841d-c80ec2146c9e/jsonoid-discovery-0.30.1.jar","uuid":"default"},"logger":"Executor"} {"ts":"2024-08-21T16:36:03.028Z","level":"INFO","msg":"Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50993.","context":{"port":"50993","service_name":" 'org.apache.spark.network.netty.NettyBlockTransferService'"},"logger":"Utils"} {"ts":"2024-08-21T16:36:03.028Z","level":"INFO","msg":"Server created on 172.16.0.32:50993","context":{"host":"172.16.0.32","port":"50993"},"logger":"NettyBlockTransferService"} {"ts":"2024-08-21T16:36:03.029Z","level":"INFO","msg":"Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy","context":{"class_name":"org.apache.spark.storage.RandomBlockReplicationPolicy"},"logger":"BlockManager"} {"ts":"2024-08-21T16:36:03.034Z","level":"INFO","msg":"Registering BlockManager BlockManagerId(driver, 172.16.0.32, 50993, None)","context":{"block_manager_id":"BlockManagerId(driver, 172.16.0.32, 50993, None)"},"logger":"BlockManagerMaster"} {"ts":"2024-08-21T16:36:03.037Z","level":"INFO","msg":"Registering block manager 172.16.0.32:50993 with 434.4 MiB RAM, BlockManagerId(driver, 172.16.0.32, 50993, None)","context":{"block_manager_id":"BlockManagerId(driver, 172.16.0.32, 50993, None)","host_port":"172.16.0.32:50993","memory_size":"434.4 MiB"},"logger":"BlockManagerMasterEndpoint"} {"ts":"2024-08-21T16:36:03.038Z","level":"INFO","msg":"Registered BlockManager BlockManagerId(driver, 172.16.0.32, 50993, None)","context":{"block_manager_id":"BlockManagerId(driver, 172.16.0.32, 50993, None)"},"logger":"BlockManagerMaster"} {"ts":"2024-08-21T16:36:03.039Z","level":"INFO","msg":"Initialized BlockManager: BlockManagerId(driver, 172.16.0.32, 50993, None)","context":{"block_manager_id":"BlockManagerId(driver, 172.16.0.32, 50993, None)"},"logger":"BlockManager"} {"ts":"2024-08-21T16:36:03.238Z","level":"INFO","msg":"MemoryStore started with capacity 434.4 MiB","context":{"memory_size":"434.4 MiB"},"logger":"MemoryStore"} {"ts":"2024-08-21T16:36:03.266Z","level":"INFO","msg":"Block broadcast_0 stored as values in memory (estimated size 237.6 KiB, free 434.2 MiB)","context":{"block_id":"broadcast_0","free_memory_size":"434.2 MiB","memory_size":"237.6 KiB"},"logger":"MemoryStore"} {"ts":"2024-08-21T16:36:03.653Z","level":"INFO","msg":"Block broadcast_0_piece0 stored as bytes in memory (estimated size 36.7 KiB, free 434.1 MiB)","context":{"block_id":"broadcast_0_piece0","memory_size":"434.1 MiB","size":"36.7 KiB"},"logger":"MemoryStore"} {"ts":"2024-08-21T16:36:03.655Z","level":"INFO","msg":"Added broadcast_0_piece0 in memory on 172.16.0.32:50993 (size: 36.7 KiB, free: 434.4 MiB)","context":{"block_id":"broadcast_0_piece0","current_memory_size":"36.7 KiB","free_memory_size":"434.4 MiB","host_port":"172.16.0.32:50993"},"logger":"BlockManagerInfo"} {"ts":"2024-08-21T16:36:03.658Z","level":"INFO","msg":"Created broadcast 0 from textFile at JsonoidSpark.scala:72","context":{"broadcast_id":"0","call_site_short_form":"textFile at JsonoidSpark.scala:72"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:03.756Z","level":"INFO","msg":"Total input files to process : 1","logger":"FileInputFormat"} {"ts":"2024-08-21T16:36:03.767Z","level":"INFO","msg":"Starting job: fold at JsonoidRDD.scala:42","context":{"call_site_short_form":"fold at JsonoidRDD.scala:42"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:03.772Z","level":"INFO","msg":"Got job 0 (fold at JsonoidRDD.scala:42) with 1 output partitions","context":{"call_site_short_form":"fold at JsonoidRDD.scala:42","job_id":"0","num_partitions":"1"},"logger":"DAGScheduler"} {"ts":"2024-08-21T16:36:03.772Z","level":"INFO","msg":"Final stage: ResultStage 0 (fold at JsonoidRDD.scala:42)","context":{"stage_id":"ResultStage 0","stage_name":"fold at JsonoidRDD.scala:42"},"logger":"DAGScheduler"} {"ts":"2024-08-21T16:36:03.772Z","level":"INFO","msg":"Parents of final stage: List()","context":{"stage_id":"List()"},"logger":"DAGScheduler"} {"ts":"2024-08-21T16:36:03.773Z","level":"INFO","msg":"Missing parents: List()","context":{"missing_parent_stages":"List()"},"logger":"DAGScheduler"} {"ts":"2024-08-21T16:36:03.774Z","level":"INFO","msg":"Submitting ResultStage 0 (MapPartitionsRDD[2] at flatMap at JsonoidRDD.scala:26), which has no missing parents","context":{"rdd_id":"MapPartitionsRDD[2] at flatMap at JsonoidRDD.scala:26","stage_id":"ResultStage 0"},"logger":"DAGScheduler"} {"ts":"2024-08-21T16:36:03.782Z","level":"INFO","msg":"Block broadcast_1 stored as values in memory (estimated size 8.6 KiB, free 434.1 MiB)","context":{"block_id":"broadcast_1","free_memory_size":"434.1 MiB","memory_size":"8.6 KiB"},"logger":"MemoryStore"} {"ts":"2024-08-21T16:36:03.788Z","level":"INFO","msg":"Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.2 KiB, free 434.1 MiB)","context":{"block_id":"broadcast_1_piece0","memory_size":"434.1 MiB","size":"4.2 KiB"},"logger":"MemoryStore"} {"ts":"2024-08-21T16:36:03.788Z","level":"INFO","msg":"Added broadcast_1_piece0 in memory on 172.16.0.32:50993 (size: 4.2 KiB, free: 434.4 MiB)","context":{"block_id":"broadcast_1_piece0","current_memory_size":"4.2 KiB","free_memory_size":"434.4 MiB","host_port":"172.16.0.32:50993"},"logger":"BlockManagerInfo"} {"ts":"2024-08-21T16:36:03.789Z","level":"INFO","msg":"Created broadcast 1 from broadcast at DAGScheduler.scala:1644","context":{"broadcast_id":"1","call_site_short_form":"broadcast at DAGScheduler.scala:1644"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:03.797Z","level":"INFO","msg":"Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at flatMap at JsonoidRDD.scala:26) (first 15 tasks are for partitions Vector(0))","context":{"num_tasks":"1","partition_ids":"Vector(0)","rdd_id":"MapPartitionsRDD[2] at flatMap at JsonoidRDD.scala:26","stage_id":"ResultStage 0"},"logger":"DAGScheduler"} {"ts":"2024-08-21T16:36:03.798Z","level":"INFO","msg":"Adding task set 0.0 with 1 tasks resource profile 0","context":{"num_tasks":"1","resource_profile_id":"0","stage_attempt":"0","stage_id":"0"},"logger":"TaskSchedulerImpl"} {"ts":"2024-08-21T16:36:03.811Z","level":"INFO","msg":"Starting task 0.0 in stage 0.0 (TID 0) (172.16.0.32,executor driver, partition 0, PROCESS_LOCAL, 9888 bytes) ","context":{"executor_id":"driver","host":"172.16.0.32","partition_id":"0","size":"9888","task_locality":"PROCESS_LOCAL","task_name":"task 0.0 in stage 0.0 (TID 0)"},"logger":"TaskSetManager"} {"ts":"2024-08-21T16:36:03.816Z","level":"INFO","msg":"Running task 0.0 in stage 0.0 (TID 0)","context":{"task_name":"task 0.0 in stage 0.0 (TID 0)"},"logger":"Executor"} {"ts":"2024-08-21T16:36:03.847Z","level":"INFO","msg":"Task (TID 0) input split: file:/Users/calvindani/Documents/Spark-jobs/sample.json:0+63","context":{"input_split":"file:/Users/calvindani/Documents/Spark-jobs/sample.json:0+63","task_id":"0","task_name":"task 0.0 in stage 0.0 (TID 0)"},"logger":"HadoopRDD"} {"ts":"2024-08-21T16:36:03.880Z","level":"ERROR","msg":"Exception in task 0.0 in stage 0.0 (TID 0)","context":{"task_name":"task 0.0 in stage 0.0 (TID 0)"},"exception":{"class":"java.lang.NoSuchMethodError","msg":"'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)'","stacktrace":[{"class":"io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$","method":"$anonfun$fromStringRDD$1","file":"JsonoidRDD.scala","line":25},{"class":"scala.collection.Iterator$$anon$10","method":"nextCur","file":"Iterator.scala","line":594},{"class":"scala.collection.Iterator$$anon$10","method":"hasNext","file":"Iterator.scala","line":608},{"class":"scala.collection.IterableOnceOps","method":"foldLeft","file":"IterableOnce.scala","line":726},{"class":"scala.collection.IterableOnceOps","method":"foldLeft$","file":"IterableOnce.scala","line":721},{"class":"scala.collection.AbstractIterator","method":"foldLeft","file":"Iterator.scala","line":1303},{"class":"scala.collection.IterableOnceOps","method":"fold","file":"IterableOnce.scala","line":792},{"class":"scala.collection.IterableOnceOps","method":"fold$","file":"IterableOnce.scala","line":792},{"class":"scala.collection.AbstractIterator","method":"fold","file":"Iterator.scala","line":1303},{"class":"org.apache.spark.rdd.RDD","method":"$anonfun$fold$2","file":"RDD.scala","line":1209},{"class":"org.apache.spark.SparkContext","method":"$anonfun$runJob$6","file":"SparkContext.scala","line":2552},{"class":"org.apache.spark.scheduler.ResultTask","method":"runTask","file":"ResultTask.scala","line":93},{"class":"org.apache.spark.TaskContext","method":"runTaskWithListeners","file":"TaskContext.scala","line":171},{"class":"org.apache.spark.scheduler.Task","method":"run","file":"Task.scala","line":146},{"class":"org.apache.spark.executor.Executor$TaskRunner","method":"$anonfun$run$5","file":"Executor.scala","line":644},{"class":"org.apache.spark.util.SparkErrorUtils","method":"tryWithSafeFinally","file":"SparkErrorUtils.scala","line":64},{"class":"org.apache.spark.util.SparkErrorUtils","method":"tryWithSafeFinally$","file":"SparkErrorUtils.scala","line":61},{"class":"org.apache.spark.util.Utils$","method":"tryWithSafeFinally","file":"Utils.scala","line":99},{"class":"org.apache.spark.executor.Executor$TaskRunner","method":"run","file":"Executor.scala","line":647},{"class":"java.util.concurrent.ThreadPoolExecutor","method":"runWorker","file":"ThreadPoolExecutor.java","line":1136},{"class":"java.util.concurrent.ThreadPoolExecutor$Worker","method":"run","file":"ThreadPoolExecutor.java","line":635},{"class":"java.lang.Thread","method":"run","file":"Thread.java","line":840}]},"logger":"Executor"} {"ts":"2024-08-21T16:36:03.891Z","level":"WARN","msg":"Lost task 0.0 in stage 0.0 (TID 0) (172.16.0.32 executor driver): java.lang.NoSuchMethodError: 'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)'\n\tat io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$.$anonfun$fromStringRDD$1(JsonoidRDD.scala:25)\n\tat scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:594)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:608)\n\tat scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:726)\n\tat scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:721)\n\tat scala.collection.AbstractIterator.foldLeft(Iterator.scala:1303)\n\tat scala.collection.IterableOnceOps.fold(IterableOnce.scala:792)\n\tat scala.collection.IterableOnceOps.fold$(IterableOnce.scala:792)\n\tat scala.collection.AbstractIterator.fold(Iterator.scala:1303)\n\tat org.apache.spark.rdd.RDD.$anonfun$fold$2(RDD.scala:1209)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2552)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)\n\tat org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:146)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n","context":{"error":"java.lang.NoSuchMethodError: 'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)'\n\tat io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$.$anonfun$fromStringRDD$1(JsonoidRDD.scala:25)\n\tat scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:594)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:608)\n\tat scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:726)\n\tat scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:721)\n\tat scala.collection.AbstractIterator.foldLeft(Iterator.scala:1303)\n\tat scala.collection.IterableOnceOps.fold(IterableOnce.scala:792)\n\tat scala.collection.IterableOnceOps.fold$(IterableOnce.scala:792)\n\tat scala.collection.AbstractIterator.fold(Iterator.scala:1303)\n\tat org.apache.spark.rdd.RDD.$anonfun$fold$2(RDD.scala:1209)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2552)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)\n\tat org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:146)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n","executor_id":"driver","host_port":"172.16.0.32","task_name":"task 0.0 in stage 0.0 (TID 0)"},"logger":"TaskSetManager"} {"ts":"2024-08-21T16:36:03.893Z","level":"ERROR","msg":"Task 0 in stage 0.0 failed 1 times; aborting job","context":{"max_attempts":"1","stage_attempt":"0","stage_id":"0","task_index":"0"},"logger":"TaskSetManager"} {"ts":"2024-08-21T16:36:03.893Z","level":"INFO","msg":"Removed TaskSet 0.0 whose tasks have all completed, from pool ","context":{"pool_name":"","stage_attempt":"0","stage_id":"0"},"logger":"TaskSchedulerImpl"} {"ts":"2024-08-21T16:36:03.896Z","level":"INFO","msg":"Canceling stage 0","context":{"stage_id":"0"},"logger":"TaskSchedulerImpl"} {"ts":"2024-08-21T16:36:03.897Z","level":"INFO","msg":"Killing all running tasks in stage 0: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (172.16.0.32 executor driver): java.lang.NoSuchMethodError: 'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)'\n\tat io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$.$anonfun$fromStringRDD$1(JsonoidRDD.scala:25)\n\tat scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:594)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:608)\n\tat scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:726)\n\tat scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:721)\n\tat scala.collection.AbstractIterator.foldLeft(Iterator.scala:1303)\n\tat scala.collection.IterableOnceOps.fold(IterableOnce.scala:792)\n\tat scala.collection.IterableOnceOps.fold$(IterableOnce.scala:792)\n\tat scala.collection.AbstractIterator.fold(Iterator.scala:1303)\n\tat org.apache.spark.rdd.RDD.$anonfun$fold$2(RDD.scala:1209)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2552)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)\n\tat org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:146)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n\nDriver stacktrace:","context":{"reason":"Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (172.16.0.32 executor driver): java.lang.NoSuchMethodError: 'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)'\n\tat io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$.$anonfun$fromStringRDD$1(JsonoidRDD.scala:25)\n\tat scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:594)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:608)\n\tat scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:726)\n\tat scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:721)\n\tat scala.collection.AbstractIterator.foldLeft(Iterator.scala:1303)\n\tat scala.collection.IterableOnceOps.fold(IterableOnce.scala:792)\n\tat scala.collection.IterableOnceOps.fold$(IterableOnce.scala:792)\n\tat scala.collection.AbstractIterator.fold(Iterator.scala:1303)\n\tat org.apache.spark.rdd.RDD.$anonfun$fold$2(RDD.scala:1209)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2552)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)\n\tat org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:146)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n\nDriver stacktrace:","stage_id":"0"},"logger":"TaskSchedulerImpl"} {"ts":"2024-08-21T16:36:03.898Z","level":"INFO","msg":"ResultStage 0 (fold at JsonoidRDD.scala:42) failed in 117 ms due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (172.16.0.32 executor driver): java.lang.NoSuchMethodError: 'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)'\n\tat io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$.$anonfun$fromStringRDD$1(JsonoidRDD.scala:25)\n\tat scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:594)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:608)\n\tat scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:726)\n\tat scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:721)\n\tat scala.collection.AbstractIterator.foldLeft(Iterator.scala:1303)\n\tat scala.collection.IterableOnceOps.fold(IterableOnce.scala:792)\n\tat scala.collection.IterableOnceOps.fold$(IterableOnce.scala:792)\n\tat scala.collection.AbstractIterator.fold(Iterator.scala:1303)\n\tat org.apache.spark.rdd.RDD.$anonfun$fold$2(RDD.scala:1209)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2552)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)\n\tat org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:146)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n\nDriver stacktrace:","context":{"error":"Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (172.16.0.32 executor driver): java.lang.NoSuchMethodError: 'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)'\n\tat io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$.$anonfun$fromStringRDD$1(JsonoidRDD.scala:25)\n\tat scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:594)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:608)\n\tat scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:726)\n\tat scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:721)\n\tat scala.collection.AbstractIterator.foldLeft(Iterator.scala:1303)\n\tat scala.collection.IterableOnceOps.fold(IterableOnce.scala:792)\n\tat scala.collection.IterableOnceOps.fold$(IterableOnce.scala:792)\n\tat scala.collection.AbstractIterator.fold(Iterator.scala:1303)\n\tat org.apache.spark.rdd.RDD.$anonfun$fold$2(RDD.scala:1209)\n\tat org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2552)\n\tat org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)\n\tat org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)\n\tat org.apache.spark.scheduler.Task.run(Task.scala:146)\n\tat org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)\n\tat org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)\n\tat org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)\n\tat org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n\nDriver stacktrace:","stage":"ResultStage 0","stage_name":"fold at JsonoidRDD.scala:42","time_units":"117"},"logger":"DAGScheduler"} {"ts":"2024-08-21T16:36:03.899Z","level":"INFO","msg":"Job 0 failed: fold at JsonoidRDD.scala:42, took 132.4725 ms","context":{"call_site_short_form":"fold at JsonoidRDD.scala:42","job_id":"0","time":"132.4725"},"logger":"DAGScheduler"} Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (172.16.0.32 executor driver): java.lang.NoSuchMethodError: 'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)' at io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$.$anonfun$fromStringRDD$1(JsonoidRDD.scala:25) at scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:594) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:608) at scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:726) at scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:721) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1303) at scala.collection.IterableOnceOps.fold(IterableOnce.scala:792) at scala.collection.IterableOnceOps.fold$(IterableOnce.scala:792) at scala.collection.AbstractIterator.fold(Iterator.scala:1303) at org.apache.spark.rdd.RDD.$anonfun$fold$2(RDD.scala:1209) at org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2552) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:146) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$3(DAGScheduler.scala:2887) at scala.Option.getOrElse(Option.scala:201) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2887) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2879) at scala.collection.immutable.List.foreach(List.scala:334) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2879) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1283) at scala.Option.foreach(Option.scala:437) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3158) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3092) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3081) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:50) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1009) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2458) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2553) at org.apache.spark.rdd.RDD.$anonfun$fold$1(RDD.scala:1211) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:417) at org.apache.spark.rdd.RDD.fold(RDD.scala:1205) at io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD.reduceSchemas(JsonoidRDD.scala:42) at io.github.dataunitylab.jsonoid.discovery.spark.JsonoidSpark$.main(JsonoidSpark.scala:75) at io.github.dataunitylab.jsonoid.discovery.spark.JsonoidSpark.main(JsonoidSpark.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NoSuchMethodError: 'org.json4s.JsonInput org.json4s.package$.string2JsonInput(java.lang.String)' at io.github.dataunitylab.jsonoid.discovery.spark.JsonoidRDD$.$anonfun$fromStringRDD$1(JsonoidRDD.scala:25) at scala.collection.Iterator$$anon$10.nextCur(Iterator.scala:594) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:608) at scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:726) at scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:721) at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1303) at scala.collection.IterableOnceOps.fold(IterableOnce.scala:792) at scala.collection.IterableOnceOps.fold$(IterableOnce.scala:792) at scala.collection.AbstractIterator.fold(Iterator.scala:1303) at org.apache.spark.rdd.RDD.$anonfun$fold$2(RDD.scala:1209) at org.apache.spark.SparkContext.$anonfun$runJob$6(SparkContext.scala:2552) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:146) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) {"ts":"2024-08-21T16:36:03.902Z","level":"INFO","msg":"Invoking stop() from shutdown hook","logger":"SparkContext"} {"ts":"2024-08-21T16:36:03.902Z","level":"INFO","msg":"SparkContext is stopping with exitCode 0 from run at Executors.java:539.","context":{"exit_code":"0","stop_site_short_form":"run at Executors.java:539"},"logger":"SparkContext"} {"ts":"2024-08-21T16:36:03.907Z","level":"INFO","msg":"Stopped Spark web UI at http://172.16.0.32:4040","context":{"web_url":"http://172.16.0.32:4040"},"logger":"SparkUI"} {"ts":"2024-08-21T16:36:03.911Z","level":"INFO","msg":"MapOutputTrackerMasterEndpoint stopped!","logger":"MapOutputTrackerMasterEndpoint"} {"ts":"2024-08-21T16:36:03.918Z","level":"INFO","msg":"MemoryStore cleared","logger":"MemoryStore"} {"ts":"2024-08-21T16:36:03.918Z","level":"INFO","msg":"BlockManager stopped","logger":"BlockManager"} {"ts":"2024-08-21T16:36:03.921Z","level":"INFO","msg":"BlockManagerMaster stopped","logger":"BlockManagerMaster"} {"ts":"2024-08-21T16:36:03.921Z","level":"INFO","msg":"OutputCommitCoordinator stopped!","logger":"OutputCommitCoordinator$OutputCommitCoordinatorEndpoint"} {"ts":"2024-08-21T16:36:03.926Z","level":"INFO","msg":"Successfully stopped SparkContext","logger":"SparkContext"} {"ts":"2024-08-21T16:36:03.926Z","level":"INFO","msg":"Shutdown hook called","logger":"ShutdownHookManager"} {"ts":"2024-08-21T16:36:03.926Z","level":"INFO","msg":"Deleting directory /private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/spark-57f3b896-947a-4402-89f7-b051873e8644","context":{"path":"/private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/spark-57f3b896-947a-4402-89f7-b051873e8644"},"logger":"ShutdownHookManager"} {"ts":"2024-08-21T16:36:03.930Z","level":"INFO","msg":"Deleting directory /private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/spark-132b0cbd-773c-44c2-b2af-8b493c82c950","context":{"path":"/private/var/folders/vv/61kpjw4x49q7s1q85cn5r6c00000gn/T/spark-132b0cbd-773c-44c2-b2af-8b493c82c950"},"logger":"ShutdownHookManager"}


michaelmior commented 3 weeks ago

@calvin-dani Unfortunately as Spark 4.0 is not released yet, it also isn't supported. I would suggest trying with Spark 3.5.

michaelmior commented 3 weeks ago

I just pushed a spark-4 branch that seems to be mostly working. I'm sure there's some things that are broken, but it may be sufficient for your use case. You should be able to download a JAR file below once the build process completes.

CI

michaelmior commented 3 weeks ago

@calvin-dani Actually, check out version 0.40.0 as I believe it should now work with Spark 3.x (with Scala 2.13) and Spark 4.0.

calvin-dani commented 2 weeks ago

Thank you! I built spark-3.5 and was able to submit the jar file for the functionality. Appreciate the support!