chezou / cloudera-parcel

customized cloudera-parcel
Other
13 stars 7 forks source link

Error with `broom::tidy` #2

Closed chezou closed 6 years ago

chezou commented 6 years ago

I tried to run broom::tidy with CDSW, but it didn't work, because of lack of libicui18n.so.55. It works on Docker container and conda when I create the parcel.

sc <- spark_connect(master = "yarn-client", config = config)

sdf_len(sc, 5, repartition = 1) %>%
  spark_apply(function(e) I(e))

iris_tbl <- sdf_copy_to(sc, iris)

spark_apply(
  iris_tbl,
  function(e) broom::tidy(lm(Petal_Length ~ Petal_Width, e)),
  names = c("term", "estimate", "std.error", "statistic", "p.value"),
  group_by = "Species")
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is waiting using lock for RScript to complete
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is starting rscript
17/08/18 01:32:47 INFO sparklyr: Gateway (9751) is waiting for sparklyr client to connect to port 8880
17/08/18 01:32:47 INFO sparklyr: Worker (9751) using source file /data1/yarn/nm/usercache/clouderanA/appcache/application_1500605980576_8118/container_1500605980576_8118_01_000002/tmp/sparkworker/7f429444-529f-4110-836f-931d0966a220/sparkworker.R
17/08/18 01:32:47 INFO sparklyr: Worker (9751) launching command /opt/cloudera/parcels/CONDAR/lib/conda-R/bin/Rscript --vanilla <source-file> 9751 FALSE;8880;localhost
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is adding env var RHOME and value /opt/cloudera/parcels/CONDAR/lib/conda-R
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is adding env var R_INCLUDE_DIR and value /opt/cloudera/parcels/CONDAR/lib/conda-R/lib/R/include
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is adding env var R_HOME and value /opt/cloudera/parcels/CONDAR/lib/conda-R/lib/R
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is adding env var R_SHARE_DIR and value /opt/cloudera/parcels/CONDAR/lib/conda-R/lib/R/share
17/08/18 01:32:47 INFO sparklyr: Worker (9751) is starting R process
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is starting 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connecting to backend using port 8880 
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) accepted connection
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) is waiting for sparklyr client to connect to port 8880
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is querying ports from backend using port 8880 
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) received command 0
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) found requested session matches current session
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) is creating backend and allocating system resources
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) created the backend
17/08/18 01:32:48 INFO sparklyr: Gateway (9751) is waiting for r process to end
17/08/18 01:32:48 INFO sparklyr: RScript (9751) found redirect gateway port 8880 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connected to backend 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connecting to backend session 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connected to backend session 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) created connection 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) is connected 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) retrieved worker context id 4 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) retrieved worker context 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) using bundle /tmp/RtmpM4zrrj/packages.tar 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) updated .libPaths with bundle packages 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) working over grouped data 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) found 3 rows 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) retrieved 3 rows 
17/08/18 01:32:48 INFO sparklyr: RScript (9751) computing closure 
17/08/18 01:32:49 ERROR sparklyr: RScript (9751) list(message = "unable to load shared object '/data1/yarn/nm/usercache/clouderanA/appcache/application_1500605980576_8118/container_1500605980576_8118_01_000002/sparklyr-bundle/stringi/libs/stringi.so':\n  libicui18n.so.55: cannot open shared object file: No such file or directory", call = dyn.load(file, DLLpath = DLLpath, ...)) 
17/08/18 01:32:49 ERROR sparklyr: RScript (9751) collected callstack: 
18: stop(e)
17: value[[3L]](cond)
16: tryCatchOne(expr, names, parentenv, handlers[[1L]])
15: tryCatchList(expr, classes, parentenv, handlers)
14: tryCatch(loadNamespace(name), error = function(e) stop(e))
13: getNamespace(ns) 
17/08/18 01:32:49 INFO sparklyr: Gateway (9751) is terminating backend
17/08/18 01:32:49 INFO sparklyr: Worker (9751) completed wait using lock for RScript
17/08/18 01:32:49 INFO sparklyr: Gateway (9751) is shutting down with expected SocketException
chezou commented 6 years ago

It works fine with package = FALSE option.

spark_apply(
  iris_tbl,
  function(e) broom::tidy(lm(Petal_Length ~ Petal_Width, e)),
  names = c("term", "estimate", "std.error", "statistic", "p.value"),
  group_by = "Species",
  packages = FALSE)
# Source:   table<sparklyr_tmp_1a4ec6a42> [?? x 6]
# Database: spark_connection
     Species        term  estimate std.error statistic      p.value
       <chr>       <chr>     <dbl>     <dbl>     <dbl>        <dbl>
1 versicolor (Intercept) 1.7812754 0.2838234  6.276000 9.484134e-08
2 versicolor Petal_Width 1.8693247 0.2117495  8.827999 1.271916e-11
3  virginica (Intercept) 4.2406526 0.5612870  7.555230 1.041600e-09
4  virginica Petal_Width 0.6472593 0.2745804  2.357267 2.253577e-02
5     setosa (Intercept) 1.3275634 0.0599594 22.141037 7.676120e-27
6     setosa Petal_Width 0.5464903 0.2243924  2.435422 1.863892e-02