ddf-project / DDF

Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine
http://ddf.io
Apache License 2.0
168 stars 42 forks source link

R example failed with #24

Closed ljzzju closed 10 years ago

ljzzju commented 10 years ago

when I run Rscript examples/basics.R all the commands work well until

fivenum(ddf) with error:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: PermGen space Calls: fivenum ... -> .jrcall -> .jcall -> .jcheck -> .Call Execution halted

so on R shell I run commands in basics.R sentence-by-sentence

if I don't run daggr(mpg ~ vs + carb, ddf, FUN=mean) or daggr(ddf, agg.cols="sum(mpg), min(hp)", by="vs, am")

other sentances work well.

However once I run any one of the above two sentences, error goes to Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : io.ddf.exception.DDFException: Unable to get fivenum summary of the given columns from table SparkDDF_spark_5be72817_9687_40eb_bab2_8c1fb190bb9c

sometimes, the error changes to be : Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: PermGen space Calls: fivenum ... -> .jrcall -> .jcall -> .jcheck -> .Call Execution halted

I was quite confused why?

Is there anything I missing?

nhanitvn commented 10 years ago

Which branch are the errors on? The master branch's R example runs OK on my machine. Could you run sessionInfo() and paste the result here?

ljzzju commented 10 years ago

thanks @nhanitvn the branch is ddf-sparksql-1.1.0 the latest version

sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-unknown-linux-gnu (64-bit)

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

khangich commented 10 years ago

Hi @ljzzju,

Fivenum doesn't work on branch ddf-sparksql-1.1.0.

FiveNum summary uses percentile UDF in SparkSQL. Sparksql 1.2.0 does not support Percentile UDF yet. [https://issues.apache.org/jira/browse/SPARK-4263].

-- Khang

khangich commented 10 years ago

Once Spark officially supports Percentile we will update DDF accordingly.

While there is a PR in Spark that supports Percentile (https://github.com/apache/spark/pull/2802) but I'd recommend you not to use it yet.

khangich commented 10 years ago

I'll close this for now.

piccolbo commented 10 years ago

Doesn't "close for now" mean "close forever"? Who's going to sift through the closed issues to reopen it? There's nothing wrong with leaving an issue open until it is fixed. Just don't add it to any milestone with a date. Anyway, this duplicates #44 I think.