Serialization error - Githubissues

dfdx / Spark.jl

Julia binding for Apache Spark

Other

204 stars 39 forks source link

Serialization error #78

Closed mkitti closed 4 years ago

mkitti commented 4 years ago

I get an errror when running basic.jl tests:

basic: Test Failed at ~/.julia/packages/Spark/kFCaM/test/basic.jl:15
  Expression: count(rdd) == 2
   Evaluated: 0 == 2
Stacktrace:
 [1] top-level scope at ~/.julia/packages/Spark/kFCaM/test/basic.jl:15
 [2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Test/src/Test.jl:1107
 [3] top-level scope at ~/.julia/packages/Spark/kFCaM/test/basic.jl:6
basic: Error During Test at ~/.julia/packages/Spark/kFCaM/test/basic.jl:16
  Test threw exception
  Expression: reduce(rdd, +) == 10
  JavaCall.JavaCallError("Error calling Java: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.lang.Exception: MethodError(Serialization.__deserialized_types__.var\"#3#5\"(), (Channel{Any}(sz_max:1024,sz_curr:2),), 0x0000000000006c80)\nStacktrace:\n [1] (::Spark.var\"#func#14\"{Serialization.__deserialized_types__.var\"#3#5\"})(::Int32, ::Channel{Any}) at ~/.julia/packages/Spark/kFCaM/src/rdd.jl:130\n [2]

debug.txt

dfdx commented 4 years ago

I can confirm that tests fail locally on several Julia versions, including Julia 1.0. This is pretty weird because very little have been changed since the last time I tested Spark on my machine. Error message is also pretty cryptic, so I couldn't identify the cause quickly. I'll continue to work on this tomorrow.

mkitti commented 4 years ago

We may need to bisect on JavaCall as well although this looks like a Julia serialization issue. Rather than testing on anonymous function also try it on a named function. In that case, the error is an undef error.

dfdx commented 4 years ago

Just to keep you updated, an error seems to appear when we serialize lambda function from Julia main script trough Java process to Julia worker: I see that Julia can understand what function is passed (the name of the function is reconstructed correctly), but can't resolve it. My initial guess was that in some version of Julia the way lambda functions are serialized have changed and our approach is not functional anymore, but I see the same problem with built-in functions (e.g. identity) as well. I must be missing some subtle detail which will explain everything once I grasp it, but right now I don't have any estimates on how long it may take.

mkitti commented 4 years ago

I also see a problem with named functions defined in the REPL.

Maybe these are related: https://github.com/JuliaLang/julia/issues/19456 https://github.com/JuliaLang/julia/pull/19594

mkitti commented 4 years ago

@aviks was asking me to check if a PR to JavaCall.jl would affect Spark.jl.

If I understand, you concur that the current issue is due to deeper changes in Julia itself and not JavaCall.jl at the moment.

aviks commented 4 years ago

This is pretty weird. I remember testing this package extensively on Julia 1.0. 😞

mkitti commented 4 years ago

We really need to grab a Julia 1.0.0 copy and see what changed between that and Julia 1.0.5. We also still need to confirm that this is not due to a change in JavaCall somehow.

dfdx commented 4 years ago

I couldn't find Julia 1.0.0, but I've tested Spark on:

Julia 1.4.1 with JavaCall 0.7.1-0.7.4
Julia 1.0.3 with JavaCall 0.7.1

All tests failed on the same line.

aviks commented 4 years ago

Which version of the JVM are you gents using? 1.8? Tests have been failing on Travis for the last six months presumably after they upgraded their default JVM to 1. The error there is "object java.lang.Object in compiler mirror not found" which is apparently either solved by downgrading the JVM to 1.8, or upgrading sbt to 0.13.17 or higher. But even moving to spark 3.0 doesn't seem to get a correct sbt version :(

dfdx commented 4 years ago

I'm on version 1.8.0_201 (which has been the default one for Scala developers in my environment for the last 3-4 years).

dfdx commented 4 years ago

I think this is somehow related to JuliaLang/julia#31966 , or at least I've got some progress in resolving it due to this issue. To give you some context, here's how Julia functions are applied to data in RDDs (using map function as an example):

When driver (Julia) calls map(rdd, f) and runs action on resulting RDD, it serializes the function f and sends it JVM process.
JVM launches Julia worker (worker_runnner.jl file) and sends serialized function to it.
Independent Worker process deserializes the function and applies it to data.

Now we get the error that a deserialized function is not defined. JuliaLang/julia#31966 suggests that the function may not be defined in the current worker's world age. So I changed in worker_runner.jl:

func(split, itc)

Base.invokelatest(func, split, itc)

After this change this started to work:

init()    
sc = SparkContext(master="local")
txt = parallelize(sc, ["hello", "world"])
rdd = map(txt, identity)
count(rdd)

So no we can apply built-in functions to RDD elements. However, changing identity to x -> x results in the same error as previously.

mkitti commented 4 years ago

I encountered this before when implementing https://github.com/mkitti/TaskWorkers.jl/blob/master/src/TaskWorkers.jl . You might need something like https://github.com/c42f/FastClosures.jl . The problem is declaring the functions in a global context.

aviks commented 4 years ago

So that makes logical sense, but is still exceedingly weird, in that the world age changes were done in Julia v0.6, and we had successfull tests on travis on Julia 1.0.3 at least till about six months ago: https://travis-ci.org/github/dfdx/Spark.jl/jobs/511214368

aviks commented 4 years ago

I got travis to run the tests with JDK8, in addition to JDK11, and the tests seem to pass in that environment with Julia 1.0.5. This lines up with all the earlier test passes on Travis with JDK8.

With Julia 1.4, it fails with the serialization error, and with JDK11 it fails with the object java.lang.Object in compiler mirror not found error.

https://travis-ci.org/github/dfdx/Spark.jl/builds/702637275

mkitti commented 4 years ago

The pom file only includes dependency version information for JDK8 https://github.com/dfdx/Spark.jl/blob/master/jvm/sparkjl/pom.xml#L183-L197

Based on https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html I recommend moving to Scala 2.12.11 or 2.13.3

mkitti commented 4 years ago

I realized that one issue I had on my own machine was that I need to set JULIA_HOME to my Julia bin directory otherwise Spark would try to run the latest Julia. That created a situation where it would serialize with Julia 1.0.5 and then deserialize with Julia 1.5-beta1.

mkitti commented 4 years ago

With #80 I now have all tests on Travis passing on JDK 8 and JDK 11: https://travis-ci.org/github/dfdx/Spark.jl/builds/703055860

dfdx commented 4 years ago

Thanks @mkitti for your awesome work! All test now pass on Spark.jl master. Does it unblock your changes to JavaCall.jl?

mkitti commented 4 years ago

Working tests here mean we can start testing the new upstream JavaCall.jl changes again. Given the issues here were unrelated to JavaCall, I suspect the proposed changes there do not cause any issues.

Prior changes wrapped all the JNI calls in the JavaCall.JNI submodule meaning the entire JNI is now accessible without needing to invoke ccall and be accessed using Julia functions. The calling interface was also simplified so the Julia developer no longer needs to track penv or jnifunc.
The currently proposed changes should improve memory management by allowing for local and global Java references.
Changes in the near future should make it easier to garbage collect local references on the Java side (JNI.PushLocalFrame, JNI.PopLocalFrame)

mkitti commented 4 years ago

I concur this issue is fixed on master. So this issue can be closed.

I have some stashed changes that expand the Travis testing further that I might send in later. I also wrote up a script to test worker_runner.jl independent of of the JVM:

using Sockets
import Spark: readint, writeint, readobj, writeobj, load_stream, dump_stream
import Spark: END_OF_DATA_SECTION, END_OF_STREAM, JULIA_EXCEPTION_THROWN
import Spark: create_map_function, add_index_param

function main()
    function testmapfunc(split,ch)
        [split+i for i in ch]
    end
    blah = x -> x^2
    blah_mapped = create_map_function(blah)
    port = parse(Int, readline(stdin))
    sock = listen(port)
    io = accept(sock)
    println("Sending 5")
    writeint(io,5)
    println("Sending func")
    #writeobj(io,testmapfunc)
    writeobj(io,blah_mapped)
    println("Dumping stream")
    dump_stream(io,[1,2,3,4,5])
    writeint(io,END_OF_DATA_SECTION)
    println("Loading stream")
    ch = load_stream(io)
    println("After loading stream")
    for i in ch
        println(i)
    end
end

main()

dfdx commented 4 years ago

Hell, I'm getting this again on Julia 1.4.1 and Julia 1.5, and can't quickly figure out what's wrong. We should really improve robustness of custom function calls, and the more I think about it, the more I want to move to Dataset API and UDFs (see #86 for some details).

Any ideas why this error may still be happening with all invokelatest() calls in place?

mkitti commented 4 years ago

Was it working before on Julia 1.4.1 and Julia 1.5? @aviks did just bump the JavaCall version, so I wonder if this is related.

Do we have a MWE of how to create the errors?

mkitti commented 4 years ago

@dfdx Another place where invokelatest may be incomplete are generators. We currently do not have good tools in base Julia to descent through generators and iterators to propagate invokelatest.

dfdx commented 4 years ago

@mkitti After you created the fix, I definitely tried out before merging to master (I believe it was on Julia 1.4.1). Then I updated my environment and could have bumped versions of Julia packages or OS packages, but I can't link any of these things to Spark.jl or JavaCall.js.

Now I tried the same code on Julia 1.4.1 and 1.5.0, as well as old and latest versions of JavaCall, with the same code as previously:

init()    
sc = SparkContext(master="local")
txt = parallelize(sc, ["hello", "world"])
rdd = map(txt, identity)
count(rdd)

And it fails with serialization error.

Can you reproduce the same error with this code? I'll be happy if this is just an effect of my dirty environment, but if this happens due to subtle changes in unrelated code, we should really re-think how we pass functions in distributed settings.

mkitti commented 4 years ago

I just executed the code using Julia 1.5.0-beta1.0 and it seems to work. My dev copy of JavaCall v0.7.4 and Spark v0.5.0. I'm going to install 1.5 and try it again using the released package versions.

$ export JULIA_COPY_STACKS=1

$ julia

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0-beta1.0 (2020-05-28)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Spark

julia> Spark.init()

julia> sc = SparkContext(master="local")
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/08/22 14:22:27 WARN Utils: Your hostname, woodward resolves to a loopback address: 127.0.1.1; using 192.168.1.165 instead (on interface eth1)
20/08/22 14:22:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/08/22 14:22:28 INFO SparkContext: Running Spark version 2.4.6
20/08/22 14:22:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/22 14:22:29 INFO SparkContext: Submitted application: Julia App on Spark
20/08/22 14:22:29 INFO SecurityManager: Changing view acls to: mkitti
20/08/22 14:22:29 INFO SecurityManager: Changing modify acls to: mkitti
20/08/22 14:22:29 INFO SecurityManager: Changing view acls groups to: 
20/08/22 14:22:29 INFO SecurityManager: Changing modify acls groups to: 
20/08/22 14:22:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mkitti); groups with view permissions: Set(); users  with modify permissions: Set(mkitti); groups with modify permissions: Set()
20/08/22 14:22:30 INFO Utils: Successfully started service 'sparkDriver' on port 43961.
20/08/22 14:22:30 INFO SparkEnv: Registering MapOutputTracker
20/08/22 14:22:30 INFO SparkEnv: Registering BlockManagerMaster
20/08/22 14:22:30 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/08/22 14:22:30 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/08/22 14:22:30 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-65b2ee41-0fba-4240-a34b-3cb75019afc3
20/08/22 14:22:30 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/08/22 14:22:30 INFO SparkEnv: Registering OutputCommitCoordinator
20/08/22 14:22:31 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/08/22 14:22:31 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.165:4040
20/08/22 14:22:31 INFO Executor: Starting executor ID driver on host localhost
20/08/22 14:22:31 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44497.
20/08/22 14:22:31 INFO NettyBlockTransferService: Server created on 192.168.1.165:44497
20/08/22 14:22:31 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/08/22 14:22:31 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.165, 44497, None)
20/08/22 14:22:31 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.165:44497 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.165, 44497, None)
20/08/22 14:22:31 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.165, 44497, None)
20/08/22 14:22:31 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.165, 44497, None)
20/08/22 14:22:31 INFO SparkContext: Added JAR /home/mkitti/.julia/dev/Spark/src/../jvm/sparkjl/target/sparkjl-0.1.jar at spark://192.168.1.165:43961/jars/sparkjl-0.1.jar with timestamp 1598124151962
SparkContext(local,Julia App on Spark)

julia> txt = parallelize(sc, ["hello", "world"])
JavaRDD()

julia> rdd = map(txt, identity)
PipelinedRDD(JavaRDD())

julia> count(rdd)
20/08/22 14:22:44 INFO SparkContext: Starting job: count at <unknown>:0
20/08/22 14:22:44 INFO DAGScheduler: Got job 0 (count at <unknown>:0) with 1 output partitions
20/08/22 14:22:44 INFO DAGScheduler: Final stage: ResultStage 0 (count at <unknown>:0)
20/08/22 14:22:44 INFO DAGScheduler: Parents of final stage: List()
20/08/22 14:22:44 INFO DAGScheduler: Missing parents: List()
20/08/22 14:22:44 INFO DAGScheduler: Submitting ResultStage 0 (JuliaRDD[1] at RDD at JuliaRDD.scala:21), which has no missing parents
20/08/22 14:22:44 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.7 KB, free 366.3 MB)
20/08/22 14:22:44 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1773.0 B, free 366.3 MB)
20/08/22 14:22:44 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.165:44497 (size: 1773.0 B, free: 366.3 MB)
20/08/22 14:22:44 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/08/22 14:22:44 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (JuliaRDD[1] at RDD at JuliaRDD.scala:21) (first 15 tasks are for partitions Vector(0))
20/08/22 14:22:44 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/08/22 14:22:44 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7363 bytes)
20/08/22 14:22:44 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/08/22 14:22:44 INFO Executor: Fetching spark://192.168.1.165:43961/jars/sparkjl-0.1.jar with timestamp 1598124151962
20/08/22 14:22:44 INFO TransportClientFactory: Successfully created connection to /192.168.1.165:43961 after 87 ms (0 ms spent in bootstraps)
20/08/22 14:22:44 INFO Utils: Fetching spark://192.168.1.165:43961/jars/sparkjl-0.1.jar to /tmp/spark-8bb787d9-cec4-4044-8ad1-86a3ba1eb3d5/userFiles-b62604ac-aba2-45ae-b30a-c77a03287194/fetchFileTemp5076155142204190806.tmp
20/08/22 14:22:45 INFO Executor: Adding file:/tmp/spark-8bb787d9-cec4-4044-8ad1-86a3ba1eb3d5/userFiles-b62604ac-aba2-45ae-b30a-c77a03287194/sparkjl-0.1.jar to class loader
20/08/22 14:22:47 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 795 bytes result sent to driver
20/08/22 14:22:47 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2745 ms on localhost (executor driver) (1/1)
20/08/22 14:22:47 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/08/22 14:22:47 INFO DAGScheduler: ResultStage 0 (count at <unknown>:0) finished in 3.051 s
20/08/22 14:22:47 INFO DAGScheduler: Job 0 finished: count at <unknown>:0, took 3.202747 s
2

(@v1.5) pkg> st JavaCall
Status `~/.julia/environments/v1.5/Project.toml`
  [494afd89] JavaCall v0.7.4 `~/.julia/dev/JavaCall`

(@v1.5) pkg> st Spark
Status `~/.julia/environments/v1.5/Project.toml`
  [e3819d11] Spark v0.5.0+ `~/.julia/dev/Spark`

mkitti commented 4 years ago

$ export JULIA_COPY_STACKS=1 

$ ./julia

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Spark

julia> Spark.init()

julia> sc = SparkContext(master="local")
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/08/22 14:47:56 WARN Utils: Your hostname, woodward resolves to a loopback address: 127.0.1.1; using 192.168.1.165 instead (on interface eth1)
20/08/22 14:47:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/08/22 14:47:57 INFO SparkContext: Running Spark version 2.4.6
20/08/22 14:47:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/22 14:47:57 INFO SparkContext: Submitted application: Julia App on Spark
20/08/22 14:47:57 INFO SecurityManager: Changing view acls to: mkitti
20/08/22 14:47:57 INFO SecurityManager: Changing modify acls to: mkitti
20/08/22 14:47:57 INFO SecurityManager: Changing view acls groups to: 
20/08/22 14:47:57 INFO SecurityManager: Changing modify acls groups to: 
20/08/22 14:47:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mkitti); groups with view permissions: Set(); users  with modify permissions: Set(mkitti); groups with modify permissions: Set()
20/08/22 14:47:57 INFO Utils: Successfully started service 'sparkDriver' on port 33673.
20/08/22 14:47:57 INFO SparkEnv: Registering MapOutputTracker
20/08/22 14:47:57 INFO SparkEnv: Registering BlockManagerMaster
20/08/22 14:47:57 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/08/22 14:47:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/08/22 14:47:57 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-41b2ace0-e14f-4fdc-bd12-21da0ca93dfc
20/08/22 14:47:57 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/08/22 14:47:57 INFO SparkEnv: Registering OutputCommitCoordinator
20/08/22 14:47:57 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/08/22 14:47:57 INFO Utils: Successfully started service 'SparkUI' on port 4041.
20/08/22 14:47:58 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.165:4041
20/08/22 14:47:58 INFO Executor: Starting executor ID driver on host localhost
20/08/22 14:47:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33843.
20/08/22 14:47:58 INFO NettyBlockTransferService: Server created on 192.168.1.165:33843
20/08/22 14:47:58 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/08/22 14:47:58 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.165, 33843, None)
20/08/22 14:47:58 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.165:33843 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.165, 33843, None)
20/08/22 14:47:58 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.165, 33843, None)
20/08/22 14:47:58 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.165, 33843, None)
20/08/22 14:47:58 INFO SparkContext: Added JAR /home/mkitti/.julia/dev/Spark/src/../jvm/sparkjl/target/sparkjl-0.1.jar at spark://192.168.1.165:33673/jars/sparkjl-0.1.jar with timestamp 1598125678416
SparkContext(local,Julia App on Spark)

julia> txt = parallelize(sc, ["hello", "world"])
JavaRDD()

julia> rdd = map(txt, identity)
PipelinedRDD(JavaRDD())

julia> count(rdd)
20/08/22 14:48:07 INFO SparkContext: Starting job: count at <unknown>:0
20/08/22 14:48:07 INFO DAGScheduler: Got job 0 (count at <unknown>:0) with 1 output partitions
20/08/22 14:48:07 INFO DAGScheduler: Final stage: ResultStage 0 (count at <unknown>:0)
20/08/22 14:48:07 INFO DAGScheduler: Parents of final stage: List()
20/08/22 14:48:07 INFO DAGScheduler: Missing parents: List()
20/08/22 14:48:07 INFO DAGScheduler: Submitting ResultStage 0 (JuliaRDD[1] at RDD at JuliaRDD.scala:21), which has no missing parents
20/08/22 14:48:07 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.7 KB, free 366.3 MB)
20/08/22 14:48:08 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1773.0 B, free 366.3 MB)
20/08/22 14:48:08 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.165:33843 (size: 1773.0 B, free: 366.3 MB)
20/08/22 14:48:08 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/08/22 14:48:08 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (JuliaRDD[1] at RDD at JuliaRDD.scala:21) (first 15 tasks are for partitions Vector(0))
20/08/22 14:48:08 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/08/22 14:48:08 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7363 bytes)
20/08/22 14:48:08 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/08/22 14:48:08 INFO Executor: Fetching spark://192.168.1.165:33673/jars/sparkjl-0.1.jar with timestamp 1598125678416
20/08/22 14:48:08 INFO TransportClientFactory: Successfully created connection to /192.168.1.165:33673 after 25 ms (0 ms spent in bootstraps)
20/08/22 14:48:08 INFO Utils: Fetching spark://192.168.1.165:33673/jars/sparkjl-0.1.jar to /tmp/spark-b30060bc-2f0a-4917-91cd-d2455f69bda6/userFiles-ad96ac72-55c7-4cc8-b756-95bbaeda3940/fetchFileTemp8251031783095239449.tmp
20/08/22 14:48:08 INFO Executor: Adding file:/tmp/spark-b30060bc-2f0a-4917-91cd-d2455f69bda6/userFiles-ad96ac72-55c7-4cc8-b756-95bbaeda3940/sparkjl-0.1.jar to class loader
20/08/22 14:48:10 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 795 bytes result sent to driver
20/08/22 14:48:10 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2330 ms on localhost (executor driver) (1/1)
20/08/22 14:48:10 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/08/22 14:48:10 INFO DAGScheduler: ResultStage 0 (count at <unknown>:0) finished in 2.525 s
20/08/22 14:48:10 INFO DAGScheduler: Job 0 finished: count at <unknown>:0, took 2.587525 s
2

(@v1.5) pkg> st Spark
Status `~/.julia/environments/v1.5/Project.toml`
  [e3819d11] Spark v0.5.0+ `~/.julia/dev/Spark`

(@v1.5) pkg> st JavaCall
Status `~/.julia/environments/v1.5/Project.toml`
  [494afd89] JavaCall v0.7.4 `~/.julia/dev/JavaCall`

mkitti commented 4 years ago


   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> cd("/home/mkitti/testenv/")

(@v1.5) pkg> activate .
 Activating environment at `~/testenv/Project.toml`

(testenv) pkg> st Spark
Status `~/testenv/Project.toml`
  [e3819d11] Spark v0.5.0

(testenv) pkg> st JavaCall
Status `~/testenv/Project.toml`
  [494afd89] JavaCall v0.7.5

julia> using Spark

julia> Spark.init()

julia> sc = SparkContext(master="local")
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/08/22 14:51:56 WARN Utils: Your hostname, woodward resolves to a loopback address: 127.0.1.1; using 192.168.1.165 instead (on interface eth1)
20/08/22 14:51:56 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/08/22 14:51:57 INFO SparkContext: Running Spark version 2.4.6
20/08/22 14:51:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/22 14:51:57 INFO SparkContext: Submitted application: Julia App on Spark
20/08/22 14:51:57 INFO SecurityManager: Changing view acls to: mkitti
20/08/22 14:51:57 INFO SecurityManager: Changing modify acls to: mkitti
20/08/22 14:51:57 INFO SecurityManager: Changing view acls groups to: 
20/08/22 14:51:57 INFO SecurityManager: Changing modify acls groups to: 
20/08/22 14:51:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mkitti); groups with view permissions: Set(); users  with modify permissions: Set(mkitti); groups with modify permissions: Set()
20/08/22 14:51:57 INFO Utils: Successfully started service 'sparkDriver' on port 38537.
20/08/22 14:51:57 INFO SparkEnv: Registering MapOutputTracker
20/08/22 14:51:57 INFO SparkEnv: Registering BlockManagerMaster
20/08/22 14:51:57 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/08/22 14:51:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/08/22 14:51:57 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-8aee533d-2b5a-4f82-a272-785cc0c3f6d8
20/08/22 14:51:57 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/08/22 14:51:57 INFO SparkEnv: Registering OutputCommitCoordinator
20/08/22 14:51:58 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/08/22 14:51:58 INFO Utils: Successfully started service 'SparkUI' on port 4041.
20/08/22 14:51:58 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.165:4041
20/08/22 14:51:58 INFO Executor: Starting executor ID driver on host localhost
20/08/22 14:51:58 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45539.
20/08/22 14:51:58 INFO NettyBlockTransferService: Server created on 192.168.1.165:45539
20/08/22 14:51:58 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/08/22 14:51:58 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.165, 45539, None)
20/08/22 14:51:58 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.165:45539 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.165, 45539, None)
20/08/22 14:51:58 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.165, 45539, None)
20/08/22 14:51:58 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.165, 45539, None)
20/08/22 14:51:58 INFO SparkContext: Added JAR /home/mkitti/.julia/packages/Spark/3MVGw/src/../jvm/sparkjl/target/sparkjl-0.1.jar at spark://192.168.1.165:38537/jars/sparkjl-0.1.jar with timestamp 1598125918560
SparkContext(local,Julia App on Spark)

julia> txt = parallelize(sc, ["hello", "world"])
JavaRDD()

julia> rdd = map(txt, identity)
PipelinedRDD(JavaRDD())

julia> count(rdd)
20/08/22 14:52:12 INFO SparkContext: Starting job: count at <unknown>:0
20/08/22 14:52:12 INFO DAGScheduler: Got job 0 (count at <unknown>:0) with 1 output partitions
20/08/22 14:52:12 INFO DAGScheduler: Final stage: ResultStage 0 (count at <unknown>:0)
20/08/22 14:52:12 INFO DAGScheduler: Parents of final stage: List()
20/08/22 14:52:12 INFO DAGScheduler: Missing parents: List()
20/08/22 14:52:12 INFO DAGScheduler: Submitting ResultStage 0 (JuliaRDD[1] at RDD at JuliaRDD.scala:21), which has no missing parents
20/08/22 14:52:12 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.7 KB, free 366.3 MB)
20/08/22 14:52:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1773.0 B, free 366.3 MB)
20/08/22 14:52:12 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.165:45539 (size: 1773.0 B, free: 366.3 MB)
20/08/22 14:52:12 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/08/22 14:52:12 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (JuliaRDD[1] at RDD at JuliaRDD.scala:21) (first 15 tasks are for partitions Vector(0))
20/08/22 14:52:12 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/08/22 14:52:12 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7363 bytes)
20/08/22 14:52:12 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/08/22 14:52:12 INFO Executor: Fetching spark://192.168.1.165:38537/jars/sparkjl-0.1.jar with timestamp 1598125918560
20/08/22 14:52:12 INFO TransportClientFactory: Successfully created connection to /192.168.1.165:38537 after 23 ms (0 ms spent in bootstraps)
20/08/22 14:52:12 INFO Utils: Fetching spark://192.168.1.165:38537/jars/sparkjl-0.1.jar to /tmp/spark-6f091ede-e3e1-4eb9-850f-ccf03f391622/userFiles-c805b258-5bbe-4976-88ef-07e46cfe4677/fetchFileTemp6150608946789333187.tmp
20/08/22 14:52:12 INFO Executor: Adding file:/tmp/spark-6f091ede-e3e1-4eb9-850f-ccf03f391622/userFiles-c805b258-5bbe-4976-88ef-07e46cfe4677/sparkjl-0.1.jar to class loader
20/08/22 14:52:14 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 752 bytes result sent to driver
20/08/22 14:52:14 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2304 ms on localhost (executor driver) (1/1)
20/08/22 14:52:15 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/08/22 14:52:15 INFO DAGScheduler: ResultStage 0 (count at <unknown>:0) finished in 2.496 s
20/08/22 14:52:15 INFO DAGScheduler: Job 0 finished: count at <unknown>:0, took 2.562712 s
2

mkitti commented 4 years ago

Everything seems to be working well for me with Julia 1.5 using the script that you gave.

$ cat sparkdemo.jl 
using Spark
using Pkg

Pkg.activate(".")

Spark.init()
sc = SparkContext(master="local")
txt = parallelize(sc, ["hello", "world"])
rdd = map(txt, identity)
println(count(rdd))

println(Pkg.status("Spark"))
println(Pkg.status("JavaCall"))

$ ~/src/julia-1.5.0/bin/julia sparkdemo.jl 
 Activating environment at `~/testenv/Project.toml`
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/08/22 14:56:55 WARN Utils: Your hostname, woodward resolves to a loopback address: 127.0.1.1; using 192.168.1.165 instead (on interface eth1)
20/08/22 14:56:55 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/08/22 14:56:55 INFO SparkContext: Running Spark version 2.4.6
20/08/22 14:56:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/22 14:56:56 INFO SparkContext: Submitted application: Julia App on Spark
20/08/22 14:56:56 INFO SecurityManager: Changing view acls to: mkitti
20/08/22 14:56:56 INFO SecurityManager: Changing modify acls to: mkitti
20/08/22 14:56:56 INFO SecurityManager: Changing view acls groups to: 
20/08/22 14:56:56 INFO SecurityManager: Changing modify acls groups to: 
20/08/22 14:56:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mkitti); groups with view permissions: Set(); users  with modify permissions: Set(mkitti); groups with modify permissions: Set()
20/08/22 14:56:56 INFO Utils: Successfully started service 'sparkDriver' on port 35373.
20/08/22 14:56:56 INFO SparkEnv: Registering MapOutputTracker
20/08/22 14:56:56 INFO SparkEnv: Registering BlockManagerMaster
20/08/22 14:56:56 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/08/22 14:56:56 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/08/22 14:56:56 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f46c1381-b872-4c0e-b7ba-176c33217811
20/08/22 14:56:56 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/08/22 14:56:56 INFO SparkEnv: Registering OutputCommitCoordinator
20/08/22 14:56:56 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/08/22 14:56:56 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.165:4040
20/08/22 14:56:56 INFO Executor: Starting executor ID driver on host localhost
20/08/22 14:56:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46509.
20/08/22 14:56:56 INFO NettyBlockTransferService: Server created on 192.168.1.165:46509
20/08/22 14:56:56 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/08/22 14:56:56 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.165, 46509, None)
20/08/22 14:56:56 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.165:46509 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.165, 46509, None)
20/08/22 14:56:56 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.165, 46509, None)
20/08/22 14:56:56 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.165, 46509, None)
20/08/22 14:56:57 INFO SparkContext: Added JAR /home/mkitti/.julia/dev/Spark/src/../jvm/sparkjl/target/sparkjl-0.1.jar at spark://192.168.1.165:35373/jars/sparkjl-0.1.jar with timestamp 1598126217167
20/08/22 14:56:59 INFO SparkContext: Starting job: count at <unknown>:0
20/08/22 14:56:59 INFO DAGScheduler: Got job 0 (count at <unknown>:0) with 1 output partitions
20/08/22 14:56:59 INFO DAGScheduler: Final stage: ResultStage 0 (count at <unknown>:0)
20/08/22 14:56:59 INFO DAGScheduler: Parents of final stage: List()
20/08/22 14:56:59 INFO DAGScheduler: Missing parents: List()
20/08/22 14:56:59 INFO DAGScheduler: Submitting ResultStage 0 (JuliaRDD[1] at RDD at JuliaRDD.scala:21), which has no missing parents
20/08/22 14:56:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.7 KB, free 366.3 MB)
20/08/22 14:56:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1773.0 B, free 366.3 MB)
20/08/22 14:56:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.165:46509 (size: 1773.0 B, free: 366.3 MB)
20/08/22 14:56:59 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1163
20/08/22 14:56:59 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (JuliaRDD[1] at RDD at JuliaRDD.scala:21) (first 15 tasks are for partitions Vector(0))
20/08/22 14:56:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/08/22 14:56:59 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7363 bytes)
20/08/22 14:56:59 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/08/22 14:56:59 INFO Executor: Fetching spark://192.168.1.165:35373/jars/sparkjl-0.1.jar with timestamp 1598126217167
20/08/22 14:56:59 INFO TransportClientFactory: Successfully created connection to /192.168.1.165:35373 after 28 ms (0 ms spent in bootstraps)
20/08/22 14:56:59 INFO Utils: Fetching spark://192.168.1.165:35373/jars/sparkjl-0.1.jar to /tmp/spark-7a41d409-d21d-49bf-b3d7-b7467db594d2/userFiles-cf3293e0-0262-46b1-90e1-63efff02c198/fetchFileTemp5965976232760035414.tmp
20/08/22 14:56:59 INFO Executor: Adding file:/tmp/spark-7a41d409-d21d-49bf-b3d7-b7467db594d2/userFiles-cf3293e0-0262-46b1-90e1-63efff02c198/sparkjl-0.1.jar to class loader
20/08/22 14:57:01 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 752 bytes result sent to driver
20/08/22 14:57:01 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2306 ms on localhost (executor driver) (1/1)
20/08/22 14:57:01 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/08/22 14:57:01 INFO DAGScheduler: ResultStage 0 (count at <unknown>:0) finished in 2.510 s
20/08/22 14:57:01 INFO DAGScheduler: Job 0 finished: count at <unknown>:0, took 2.571964 s
2
Status `~/testenv/Project.toml`
  [e3819d11] Spark v0.5.0
nothing
Status `~/testenv/Project.toml`
  [494afd89] JavaCall v0.7.5
nothing

dfdx commented 4 years ago

Thanks for testing it! This should be problem with my setup then, I'll try to reproduce the error on a fresh Linux box and close this issue if everything works fine.

dfdx commented 4 years ago

Indeed, after removing .julia directory and reinstalling Spark.jl everything works perfectly fine. I did it on the same Linux box as previously and checked versions of all dependent packages, and couldn't find the difference. But anyway, fresh install resolves the issue, so I'm re-closing this ticket. Thanks for being with me on this problem!