Closed niczky12 closed 6 years ago
I'd start from setting SPARK_HOME=""
(empty string) and maybe rebuilding the package with Pkg.build("Spark")
. In this case I expect Spark.jl to use built-in version of Spark (as part of uberjar in jvm/sparkjl
) and run smoothly.
If this works, you can update this line to 2.2.1 (which matches your version of Spark), rebuild again and set SPARK_HOME
to the previous version.
Please, let me know if this works.
I tried setting SPARK_HOME=""
in Julia, but didn't do anything.
Then I set this as environment variable on my mac. That worked, sort of...
It did rebuild Spark successfully, but when I try to run Spark.init()
it throws the following:
julia> Spark.init()
signal (11): Segmentation fault: 11
while loading no file, in expression starting on line 0
unknown function (ip: 0x11e6842b3)
Allocations: 4098280 (Pool: 4096766; Big: 1514); GC: 6
I'm not sure where and how I would update the referenced line to fix this. Can you give a bit more detailed info on this? Sorry if these are silly questions. Thanks!
I tried setting SPARK_HOME="" in Julia, but didn't do anything. Then I set this as environment variable on my mac.
Ah, sorry, I indeed meant env var. Glad that it resolved the issue.
signal (11): Segmentation fault: 11
It looks like the known issue with JavaCall.jl / JVM on macos. Fortunately, it's just a wrong message that doesn't actually prevent you from running the code: even though REPL looks like it hangs up, you can actually press Enter and it should continue normally.
This works! Thanks a lot. So what is that bit that you're saying about using my already installed spark version?
When you run Pkg.build("Spark")
it builds Maven project with pom.xml defining all its dependencies, including Spark and its version - 2.1.0 by default. On the other hand, your SPARK_HOME
(before editing) points to a separate installation, /usr/local/Cellar/apache-spark/2.2.1/
according to the error message. So when you initialize Spark.jl, it looks at your SPARK_HOME
and tries to read a config file from it, but version 2.2.1 doesn't have that config and fails.
I wrote a detailed description how to bypass it by manually editing pom.xml
, but it made me think what would be the proper fix for it which resulted in configurable-spark-version
branch. Please, check it out and run from the Julia REPL:
ENV["BUILD_SPARK_VERSION"] = "2.2.1"
Pkg.build("Spark")
Then set SPARK_HOME
env var to its default value on your system (e.g. by opening a fresh terminal window) and try running examples. If this works for you, I'll merge this branch and make the feature available for everybody.
I tried the above.
First I checked out configurable-spark-version
by running: Pkg.checkout("Spark", "configurable-spark-version")
. I made sure SPARK_HOME
was not set in as a environment variable and set ENV
as you said above.
Build run ok, but Spark.init()
gave me the same error as before:
julia> Spark.init()
ERROR: SystemError: opening file /usr/local/Cellar/apache-spark/2.2.1/libexec/conf/spark-defaults.conf: No such file or directory
Stacktrace:
[1] #systemerror#44 at ./error.jl:64 [inlined]
[2] systemerror(::String, ::Bool) at ./error.jl:64
[3] open(::String, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./iostream.jl:104
[4] open(::Base.#readstring, ::String) at ./iostream.jl:150
[5] load_spark_defaults(::Dict{Any,Any}) at /Users/bkomar/.julia/v0.6/Spark/src/init.jl:51
[6] init() at /Users/bkomar/.julia/v0.6/Spark/src/init.jl:5
Am I doing something wrong? I'd be happy to do some testing if needed :)
In logs of Pkg.build()
, what version of spark-core
is used (this should be the text just next to "spark-core")?
It it's still "2.1.0", something went wrong. In this case maybe check from command line that the branch is indeed "configurable-spark-version" (I've never used Pkg.checkout(pkg, branch)
version, so not sure it's stable).
If the version is "2.2.1", then I believe you have some unusual version of Spark with a different directory layout. How did you install it?
I had a look at the build logs, it seems like it's still using 2.1.`
[INFO] Including org.apache.spark:spark-core_2.11:jar:2.1.1 in the shaded jar.
I think I'm on the right branch according to Pkg
:
julia> Pkg.status("Spark")
- Spark 0.2.0+ configurable-spark-version
Also confirmed by git on the command line:
lon-mac-bkomar:Spark bkomar$ git branch
* configurable-spark-version
master
So I'm definitely on the correct branch. I'll reinstall Spark if I time over the weekend and see if that works. Thanks for your help.
Any gotchas I should look out for while installing Spark?
I don't think reinstalling the same version of Spark will help: previously I experienced issues with Spark installed using different builds, e.g. one from Cloudera's CDH and another downloaded from the official website (CDH puts configs to a separate directory, together with configs of other Hadoop tools). If you use some build which significantly different from the on the official site, you may get into the same issue.
If this is the case and you can find where in your installation spark-defaults.conf
is, we can update the way we discover this file and it may be enough to start working.
But first of all I'd ensure that Spark.jl and your installed Spark have the same version. Can you please change this line to have value 2.2.1 and rebuild Spark.jl once again?
I tried changing the pom.xml file but I have the same error.
I also realised that this Spark install is coming from sparklyr, so it might be a different build than the official one.
But I finally found the issue. Thank you so much for all your help.
Basically this build of spark had spark-defaults.conf.template
file instead of spark-defaults.conf
in the above mentioned folder. I changed the init.jl
where you pointed me and I was able to build and Spark.init()
in Julia.
I'm not sure if this would be worth the trouble of fixing as it is not a bug but more of a problem with different spark builds....
Maybe there could be an option to change the location of whereSpark.jl
expects to see this config file with an ENV
variable? I don't know, I'm really out of my depth here.
If you want, I can run further tests on my machine regarding the different spark versions. Let me know and I'd be happy to help. Otherwise, we can just close this issue.
I'ts worth to discover such thing automatically, so I created spark-conf-location
branch that looks at spark-defaults.conf.template
as well. Could you please check out this (totally untested) branch and tell if it now finds config correctly?
Also, did you manage to build Spark.jl for version 2.2.1 through environment variable using configurable-spark-version
branch?
Hi,
So I could build Spark.jl for version 2.2.1 with configurable-spark-version
but Spark.init()
failed due to the different file name.
I checked out your spark-conf-location
branch. There was one typo in it on line 57 in src/init.jl
:
- spark_defaults_conf = spark_default_locs[conf_idx]
+ spark_defaults_conf = spark_defaults_locs[conf_idx]
I fixed this manually and both Pkg.build
and Spark.init
run flawlessly on my machine. 😄
Perfect, so I'll fix the typo and merge both branches. Thanks for testing and debugging!
Done, merged both branches (configurable Spark version and improved config discovery) to master.
Is there anything else I can help within this issue?
Nope. I think this can be closed. Thanks for your help.
I'm trying to use the package on a mac with Julia 0.6.2, but when I try
Spark.init()
I get the following error:I have spark installed and it runs okay from R via
sparklyr
are there some additional setup steps that I missed?Thanks