MichaelChirico commented 4 years ago

Created attachment 1676 [details] tool to reproduce the crash issue

I evaluate R script code in Java side through JRI. Most of functions are working well, but when loading a model with many fields, it crash JVM directly.

Reproduce step: 1) Install R 3.1.1 in Windows 7, install "rJava" package in R command line; 2) (follow up JRI instruction) Create environment variable for "R_HOME", "R_INCLUDE_DIR", "R_SHARE_DIR", "R_LIBS", add "$R_HOME/bin" and "$R_LIBS/rJava/jri" into "Path" variable (if you run 64bit OS, please use "$R_HOME/bin/x64" and $R_LIBS/rJava/jri/x64" instead) (You can use "library()" in R console to print out the exact location of "R_LIBS")

3) Download the attachment zip file "JRI.zip". Unzip it, and run "run.bat" in command line;

4) Check R version

sessionInfo()

R version 3.1.1 (2014-07-10) Platform: i386-w64-mingw32/i386 (32-bit)

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

5) run load() command, crash.

load("modelRF.rda")

An unrecoverable stack overflow has occurred. # # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_STACK_OVERFLOW (0xc00000fd) at pc=0x6f8c794e, pid=421092, tid=371088 # # JRE version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13)

# Java VM: Java HotSpot(TM) Client VM (24.51-b03 mixed mode windows-x86 ) # Problematic frame: # C [Rzlib.dll+0x794e]

6) Run above code in Mac, it is ok. If running "load()" command in standalone R terminal in Windows, it is still ok.

Some tips: the modelRF.rda is the random forest model backup file. The special point is the model input has 1776 fields. The crash happens in R inside, but not JRI or Java.

Investigation steps: 1) build a debug version R in Windows OS; 2) Run the same JRI console, and launch GDB; attach to JVM process; 3) Load symbol files by "set solib-search-path"; 3) Add breakpoint on saveload.c code of R source code;

(gdb) b "saveload.c":2333

4) then run "load()" in JRI console; the execution will be paused in 2333 line of saveload.c code; 5) go back to GDB, and type "c" to continue, you will find the "segment fault" on inflate.c code; use "info stack" to print the stack trace: (gdb) info stack

0 0x6f8c8cd7 in inflate (strm=0x19098640, flush=0) at inflate.c:1234

1 0x6c7b7c10 in R_gzread (file=0x19098640, buf=0x19293158, len=4)

at gzio.h:333

2 0x6c7b87d9 in R_gzread (len=4, buf=0x19293158, file=0x19098640)

at gzio.h:289

3 gzfile_read (ptr=0x19293158, size=1, nitems=4, con=0x1900f028)

at connections.c:1531

4 0x6c791659 in InBytesConn (stream=0x192ddfac, buf=0x19293158, length=4)

at serialize.c:2034

5 0x6c791ec6 in InInteger (stream=0x192ddfac) at serialize.c:361

6 0x6c7932ee in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1644

7 0x6c792680 in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1526

8 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1599

9 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1601

10 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1601

11 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1601

......

1195 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1601

1196 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1601

1197 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1599

1198 0x6c792cdf in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1589

1199 0x6c7924b7 in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1699

1200 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

at serialize.c:1599

1201 0x6c796193 in R_Unserialize (stream=0x192ddfac) at serialize.c:1894

1202 0x6c7cc6d1 in do_loadFromConn2 (call=0x19e02980, op=0x4115b4,

args=0x19e03798, env=0x19e025e4) at saveload.c:2378

1203 0x6c775d1d in bcEval (body=, rho=,

useCache=<optimized out>) at eval.c:4753

1204 0x6c77e512 in Rf_eval (e=0x19e01034, rho=0x19e025e4) at eval.c:560

1205 0x6c7829c6 in Rf_eval (rho=0x19e025e4, e=0x19e01034) at eval.c:519

1206 Rf_applyClosure (call=0x19e011d8, op=0x19e010c0, arglist=0x19e02654,

rho=0x41920c, suppliedenv=0x419228) at eval.c:1044

1207 0x6c77e627 in Rf_eval (e=0x19e011d8, rho=0x41920c) at eval.c:676

1208 0x6c7514c3 in Rf_ReplIteration (rho=0x41920c, savestack=0,

browselevel=0, state=0x192de7ec) at main.c:260

1209 0x6c75179d in R_ReplConsole (rho=, savestack=0,

browselevel=0) at main.c:310

1210 0x192df838 in ?? ()

Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I ignored some duplicated lines for serialize.c:1601. serialize.c ReadItem() will be invoked in many loops and finally cause stack overflow. The field count is 1776 in the randomRF.rda model, and the stack trace goes to around 1190 loops and die.

Analysis: When using R in JRI, the R is in single thread mode, embedded in JVM process. The runtime of R is limited in JVM thread level, like stack size (128K in JVM most of time). When there is recursive function call in R code, it is risk to cause stack overflow. Is it possible to rewrite serialize.c ReadItem() with for/while, but not by recursive function call?

We got another workaround, increase JVM thread stack size with "-Xss2M". But this workaround definitely is tricky and not for production env, as thread stack size should be not bigger and there are multiple threads in J2EE environment.

METADATA

Bug author - oppokui
Creation time - 2014-10-21 04:58:55 UTC
Bugzilla link
Status - NEW
Alias - None
Component - Wishlist
Version - R 3.1.1
Hardware - All Windows 32-bit
Importance - P3 major
Assignee - R-core
URL -
Modification time - 2015-04-21 06:11 UTC

MichaelChirico commented 4 years ago

This is just a very complicated way to write a wish item to use iterative unserialization of pairlists - it has actually nothing to do directly with JRI.

(Note that stack checking is disabled in JRI due to threads - hence a crash of stack overflow which is expected).

METADATA

Comment author - Simon Urbanek
Timestamp - 2014-10-21 16:21:16 UTC

MichaelChirico commented 4 years ago

yes, it is nothing with JRI. Can you put it with high priority? As it crash the process, should not be a simple wish. In our customer case, there are much data stored in long columns (around 10,000 columns at most). Right now a sample model with 1776 columns will crash immediately. I am not sure whether we need to continue to try R or consider other techniques.

METADATA

Comment author - oppokui
Timestamp - 2014-10-22 10:47:46 UTC

MichaelChirico commented 4 years ago

(In reply to Simon Urbanek from comment #1)

This is just a very complicated way to write a wish item to use iterative
unserialization of pairlists - it has actually nothing to do directly with
JRI.

Note that this is implemented in the latest version of pqR (pqR-2014-09-30) available at pqR-project.org. This change involves only a few lines of code (though there are other changes to unserialization in pqR as well, to support its use of read-only constants).

METADATA

Comment author - Radford Neal
Timestamp - 2014-10-22 14:51:03 UTC

MichaelChirico commented 4 years ago

It is a good news. Thanks, Radford. Performance is very important for big data mining. But I saw pgR website said "Windows system is not currently recommended", is there anyone use it in Windows in product mode? What is the relationship between pgR and R? Is there a plan to merge pgR enhancements back to R?

METADATA

Comment author - oppokui
Timestamp - 2014-10-23 07:27:29 UTC

github-actions[bot] commented 4 years ago

NA

METADATA

Comment author - Luke Tierney
Timestamp - 2020-07-16 22:39:39 UTC

MichaelChirico / r-bugs

[BUGZILLA #16034] wish: use iterative unserialization for pairlists #5491

0 0x6f8c8cd7 in inflate (strm=0x19098640, flush=0) at inflate.c:1234

1 0x6c7b7c10 in R_gzread (file=0x19098640, buf=0x19293158, len=4)

2 0x6c7b87d9 in R_gzread (len=4, buf=0x19293158, file=0x19098640)

3 gzfile_read (ptr=0x19293158, size=1, nitems=4, con=0x1900f028)

4 0x6c791659 in InBytesConn (stream=0x192ddfac, buf=0x19293158, length=4)

5 0x6c791ec6 in InInteger (stream=0x192ddfac) at serialize.c:361

6 0x6c7932ee in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

7 0x6c792680 in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

8 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

9 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

10 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

11 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

1195 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

1196 0x6c79260f in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

1197 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

1198 0x6c792cdf in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

1199 0x6c7924b7 in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

1200 0x6c7925ed in ReadItem (ref_table=0x19e036b8, stream=0x192ddfac)

1201 0x6c796193 in R_Unserialize (stream=0x192ddfac) at serialize.c:1894

1202 0x6c7cc6d1 in do_loadFromConn2 (call=0x19e02980, op=0x4115b4,

1203 0x6c775d1d in bcEval (body=, rho=,

1204 0x6c77e512 in Rf_eval (e=0x19e01034, rho=0x19e025e4) at eval.c:560

1205 0x6c7829c6 in Rf_eval (rho=0x19e025e4, e=0x19e01034) at eval.c:519

1206 Rf_applyClosure (call=0x19e011d8, op=0x19e010c0, arglist=0x19e02654,

1207 0x6c77e627 in Rf_eval (e=0x19e011d8, rho=0x41920c) at eval.c:676

1208 0x6c7514c3 in Rf_ReplIteration (rho=0x41920c, savestack=0,

1209 0x6c75179d in R_ReplConsole (rho=, savestack=0,

1210 0x192df838 in ?? ()

METADATA

METADATA

METADATA

METADATA

METADATA

METADATA