Closed aviks closed 5 years ago
Hmm, this code works for me on Julia 0.5.0 and latest master of Spark.jl:
julia> using Spark
Loaded /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so
julia> sc = SparkContext(master="local");
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/06/02 22:29:10 INFO SparkContext: Running Spark version 1.5.2
...
julia> rdd = parallelize(sc, 1:10)
PipelinedRDD(JavaRDD())
julia> map(rdd, x->x+1)
PipelinedRDD(JavaRDD())
julia> rdd2 = map(rdd, x->x+1)
PipelinedRDD(JavaRDD())
julia> rdd3 = Spark.cache(rdd2)
PipelinedRDD(JavaRDD())
julia> rdd4 = map(rdd3, x->x^2)
PipelinedRDD(JavaRDD())
julia> collect(rdd4)
17/06/02 22:30:00 INFO SparkContext: Starting job: first at <unknown>:0
17/06/02 22:30:00 INFO DAGScheduler: Got job 0 (first at <unknown>:0) with 1 output partitions
17/06/02 22:30:00 INFO DAGScheduler: Final stage: ResultStage 0(first at <unknown>:0)
17/06/02 22:30:00 INFO DAGScheduler: Parents of final stage: List()
17/06/02 22:30:00 INFO DAGScheduler: Missing parents: List()
17/06/02 22:30:00 INFO DAGScheduler: Submitting ResultStage 0 (JuliaRDD[5] at RDD at JuliaRDD.scala:15), which has no missing parents
17/06/02 22:30:00 INFO MemoryStore: ensureFreeSpace(2992) called with curMem=0, maxMem=515553361
17/06/02 22:30:00 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.9 KB, free 491.7 MB)
17/06/02 22:30:00 INFO MemoryStore: ensureFreeSpace(1685) called with curMem=2992, maxMem=515553361
17/06/02 22:30:00 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1685.0 B, free 491.7 MB)
17/06/02 22:30:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:35197 (size: 1685.0 B, free: 491.7 MB)
17/06/02 22:30:00 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:861
17/06/02 22:30:00 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (JuliaRDD[5] at RDD at JuliaRDD.scala:15)
17/06/02 22:30:00 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/06/02 22:30:00 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 2308 bytes)
17/06/02 22:30:00 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/06/02 22:30:00 INFO Executor: Fetching http://192.168.100.5:44519/jars/sparkjl-0.1.jar with timestamp 1496431751420
17/06/02 22:30:00 INFO Utils: Fetching http://192.168.100.5:44519/jars/sparkjl-0.1.jar to /tmp/spark-af4d4b7f-b4c8-41b4-a911-9bdcc269fa30/userFiles-83eba5c4-23dd-4a3c-8257-43376249b2b9/fetchFileTemp9088193860230804638.tmp
17/06/02 22:30:00 INFO Executor: Adding file:/tmp/spark-af4d4b7f-b4c8-41b4-a911-9bdcc269fa30/userFiles-83eba5c4-23dd-4a3c-8257-43376249b2b9/sparkjl-0.1.jar to class loader
Loaded /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======= file 'appveyor.yml' ========
environment:
HADOOP_HOME: C:\projects\hadoop
matrix:
- JULIA_URL: "https://julialang-s3.julialang.org/bin/winnt/x64/0.5/julia-0.5-latest-win64.exe"
JAVA_HOME: C:\Program Files\Java\jdk1.7.0\
notifications:
- provider: Email
on_build_success: false
on_build_failure: false
on_build_status_changed: false
os: Windows Server 2012
install:
- ps: "[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12"
# Download most recent Julia Windows binary
- ps: (new-object net.webclient).DownloadFile(
$env:JULIA_URL,
"C:\projects\julia-binary.exe")
#install hadoop winutils
- cmd: md C:\projects\hadoop
- cmd: md C:\projects\hadoop\bin
- ps: (new-object System.Net.WebClient).DownloadFile(
"https://github.com/steveloughran/winutils/blob/master/hadoop-2.6.0/bin/winutils.exe?raw=true",
"C:\projects\hadoop\bin\winutils.exe")
# Run installer silently, output to C:\projects\julia
- C:\projects\julia-binary.exe /S /D=C:\projects\julia
build_script:
# Need to convert from shallow to complete for Pkg.clone to work
- IF EXIST .git\shallow (git fetch --unshallow)
- C:\projects\julia\bin\julia -e "versioninfo();
Pkg.clone(pwd(), \"Spark\"); Pkg.build(\"Spark\")"
test_script:
- C:\projects\julia\bin\julia --check-bounds=yes -e "Pkg.test(\"Spark\")"
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
17/06/02 22:30:01 INFO CacheManager: Partition rdd_0_0 not found, computing it
17/06/02 22:30:01 INFO MemoryStore: ensureFreeSpace(296) called with curMem=4677, maxMem=515553361
17/06/02 22:30:01 INFO MemoryStore: Block rdd_0_0 stored as values in memory (estimated size 296.0 B, free 491.7 MB)
17/06/02 22:30:01 INFO BlockManagerInfo: Added rdd_0_0 in memory on localhost:35197 (size: 296.0 B, free: 491.7 MB)
17/06/02 22:30:03 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1538 bytes result sent to driver
17/06/02 22:30:03 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3229 ms on localhost (1/1)
17/06/02 22:30:03 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/06/02 22:30:03 INFO DAGScheduler: ResultStage 0 (first at <unknown>:0) finished in 3.239 s
17/06/02 22:30:03 INFO DAGScheduler: Job 0 finished: first at <unknown>:0, took 3.367172 s
17/06/02 22:30:03 INFO SparkContext: Starting job: collect at JuliaRDD.scala:44
17/06/02 22:30:03 INFO DAGScheduler: Got job 1 (collect at JuliaRDD.scala:44) with 1 output partitions
17/06/02 22:30:03 INFO DAGScheduler: Final stage: ResultStage 1(collect at JuliaRDD.scala:44)
17/06/02 22:30:03 INFO DAGScheduler: Parents of final stage: List()
17/06/02 22:30:03 INFO DAGScheduler: Missing parents: List()
17/06/02 22:30:03 INFO DAGScheduler: Submitting ResultStage 1 (JuliaRDD[4] at RDD at JuliaRDD.scala:15), which has no missing parents
17/06/02 22:30:03 INFO MemoryStore: ensureFreeSpace(2840) called with curMem=4973, maxMem=515553361
17/06/02 22:30:03 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 491.7 MB)
17/06/02 22:30:03 INFO MemoryStore: ensureFreeSpace(1639) called with curMem=7813, maxMem=515553361
17/06/02 22:30:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1639.0 B, free 491.7 MB)
17/06/02 22:30:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:35197 (size: 1639.0 B, free: 491.7 MB)
17/06/02 22:30:03 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861
17/06/02 22:30:03 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (JuliaRDD[4] at RDD at JuliaRDD.scala:15)
17/06/02 22:30:03 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/06/02 22:30:03 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 2308 bytes)
17/06/02 22:30:03 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
Loaded /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======= file 'appveyor.yml' ========
environment:
HADOOP_HOME: C:\projects\hadoop
matrix:
- JULIA_URL: "https://julialang-s3.julialang.org/bin/winnt/x64/0.5/julia-0.5-latest-win64.exe"
JAVA_HOME: C:\Program Files\Java\jdk1.7.0\
notifications:
- provider: Email
on_build_success: false
on_build_failure: false
on_build_status_changed: false
os: Windows Server 2012
install:
- ps: "[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12"
# Download most recent Julia Windows binary
- ps: (new-object net.webclient).DownloadFile(
$env:JULIA_URL,
"C:\projects\julia-binary.exe")
#install hadoop winutils
- cmd: md C:\projects\hadoop
- cmd: md C:\projects\hadoop\bin
- ps: (new-object System.Net.WebClient).DownloadFile(
"https://github.com/steveloughran/winutils/blob/master/hadoop-2.6.0/bin/winutils.exe?raw=true",
"C:\projects\hadoop\bin\winutils.exe")
# Run installer silently, output to C:\projects\julia
- C:\projects\julia-binary.exe /S /D=C:\projects\julia
build_script:
# Need to convert from shallow to complete for Pkg.clone to work
- IF EXIST .git\shallow (git fetch --unshallow)
- C:\projects\julia\bin\julia -e "versioninfo();
Pkg.clone(pwd(), \"Spark\"); Pkg.build(\"Spark\")"
test_script:
- C:\projects\julia\bin\julia --check-bounds=yes -e "Pkg.test(\"Spark\")"
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
17/06/02 22:30:05 INFO BlockManager: Found block rdd_0_0 locally
17/06/02 22:30:06 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 2221 bytes result sent to driver
17/06/02 22:30:06 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 2722 ms on localhost (1/1)
17/06/02 22:30:06 INFO DAGScheduler: ResultStage 1 (collect at JuliaRDD.scala:44) finished in 2.723 s
17/06/02 22:30:06 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/06/02 22:30:06 INFO DAGScheduler: Job 1 finished: collect at JuliaRDD.scala:44, took 2.731046 s
10-element Array{Int64,1}:
4
9
16
25
36
49
64
81
100
121
Not sure where lines about appveyor.yml
come from and I used Spark.cache()
because it turns to be not exported, but the rest seems to work perfectly well for me. Are you using a custom version of Spark.jl or Spark binaries?
yeah, I was using Spark 2.1.0. Probably due to this - https://issues.apache.org/jira/browse/SPARK-18406
Using
cache
seems to results in anassertion failure
.cc: @dfdx is this expected?