elbamos / Zeppelin-With-R

Mirror of Apache Zeppelin (Incubating)
Apache License 2.0
45 stars 24 forks source link

## Error in loadNamespace(name): there is no package called 'SparkR' #6

Closed QuantScientist3 closed 8 years ago

QuantScientist3 commented 8 years ago

Hi, Working on OSX, I tried every possible combination of setting/unsetting SPARK_HOME (*/sh files) and/or setting spark.home in interpreter.json.

It fails with:

Error in loadNamespace(name): there is no package called 'SparkR'

My stand alone R version works great with the same Spark 1.5 distribution.

What am I doing wrong? BTW, I don't have an issue with case sensitivity on OSX.

zeppelin_ _r_ _zeppelin sh_ _248x63_and_mozilla_firefox_and_rstudio_and_find_files Thanks,

elbamos commented 8 years ago

Can you provide the result of running ls -laF on the directory you're using for spark home?

(FYI-often, with Mac installs of spark, the proper SPARK_HOME is actually a subdirectory of wherever you installed Spark. The directory contents will help isolate if that's the issue.)

On Jan 16, 2016, at 4:23 PM, Shlomo. notifications@github.com wrote:

Hi, Working on OSX, I tried every possible combination of setting/unsetting SPARK_HOME (*/sh files) and/or setting sparkhome in interpreterjson

It fails with:

Error in loadNamespace(name): there is no package called 'SparkR'

My stand alone R version works great with the same Spark 15 distribution

What am I doing wrong? BTW, I don't

have an issue with case sensitivity on OSX

Thanks,

— Reply to this email directly or view it on GitHub.

QuantScientist3 commented 8 years ago

I had spark source instead of spark binary, this was the issue. Thanks. Now I have another issue:

print system information

R.version Sys.info() mypkgs <- c("dplyr", "ggplot2", "magrittr","parallel");

install.packages(mypkgs)

Sys.setenv(JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/")

install.packages("rJava")

library("rJava")

Start SparkR

Sys.setenv(SPARK_HOME="/Users/freebsd/repo/dev/java/rt/spark/") print(Sys.getenv("SPARK_HOME")) .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) print(.libPaths()) Sys.setenv("PATH" = paste(Sys.getenv("PATH"),"/Library/Frameworks/R.framework/Versions/3.2/Resources/bin",file.path(Sys.getenv("SPARK_HOME"), "bin"),sep=":")) Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.3.0" "sparkr-shell"') print(Sys.getenv("PATH")) library(SparkR)

sc <- sparkR.init(master = "local", appName = "SparkR_demo_RTA")

sqlContext <- sparkRSQL.init(sc) df <- createDataFrame(sqlContext, faithful) head(df)

Print its schema

printSchema(df)

df2 <- read.df(sqlContext, "/Users/freebsd/repo/dev/data-sets/mlr.csv", source = "com.databricks.spark.csv", inferSchema = "true")

The com.databricks:spark-csv_2.10:1.3.0 does not load, even though it loads correctly from the Spark shell:

zeppelin_ _r_ _zeppelin sh_ _248x63

elbamos commented 8 years ago

Someone else reported the same thing. It appears to be an issue of the way Zeppelin handles dependencies. The workaround is to load the data frame using the %spark interpreter, give it a temporary table name, then access the temporary table from R by name.

On Jan 16, 2016, at 5:25 PM, Shlomo. notifications@github.com wrote:

I had spark source instead of spark binary, this was the issue. Thanks. Now I have another issue:

print system information

R.version Sys.info() mypkgs <- c("dplyr", "ggplot2", "magrittr","parallel");

install.packages(mypkgs)

Sys.setenv(JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/")

install.packages("rJava")

library("rJava")

Start SparkR

Sys.setenv(SPARK_HOME="/Users/freebsd/repo/dev/java/rt/spark/") print(Sys.getenv("SPARK_HOME")) .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) print(.libPaths()) Sys.setenv("PATH" = paste(Sys.getenv("PATH"),"/Library/Frameworks/R.framework/Versions/3.2/Resources/bin",file.path(Sys.getenv("SPARK_HOME"), "bin"),sep=":")) Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.3.0" "sparkr-shell"') print(Sys.getenv("PATH")) library(SparkR)

sc <- sparkR.init(master = "local", appName = "SparkR_demo_RTA")

sqlContext <- sparkRSQL.init(sc) df <- createDataFrame(sqlContext, faithful) head(df)

Print its schema

printSchema(df)

df2 <- read.df(sqlContext, "/Users/freebsd/repo/dev/data-sets/mlr.csv", source = "com.databricks.spark.csv", inferSchema = "true")

The com.databricks:spark-csv_2.10:1.3.0 does not load, even though it loads correctly from the Spark shell:

— Reply to this email directly or view it on GitHub.

elbamos commented 8 years ago

I'm going to mark this closed; please re-open if I'm mistaken. Thanks!

QuantScientist3 commented 8 years ago

Hi, Is this related to the second issue I reported? The com.databricks:spark-csv_2.10:1.3.0 lib does not load?

thanks,