albertosantini / node-rio

Integration with Rserve, a TCP/IP server for R framework
https://github.com/albertosantini/node-conpa
MIT License
176 stars 35 forks source link

Large returning dataset results in Invalid Response #31

Closed sanderd closed 8 years ago

sanderd commented 8 years ago

Hi,

For a recent project, I have a very large JSON result-set, calculated by R. It is being sent as string (explicitely converted in R using jsonlite + toString)

However, as soon as the resulting data (string) gets too big, RIO exits with the following:

Connected to Rserve
Supported capabilities --------------

Sending command to Rserve
00000000: 0300 0000 b400 0000 0000 0000 0000 0000  ....4...........
00000010: 04b0 0000 6c69 6272 6172 7928 276a 736f  .0..library('jso
00000020: 6e6c 6974 6527 2920 0a63 616c 6c6d 6520  nlite')..callme.
00000030: 3c2d 2066 756e 6374 696f 6e28 7029 207b  <-.function(p).{
00000040: 200a 2020 7361 6d70 6c65 4c69 7374 203c  ....sampleList.<
00000050: 2d20 6c69 7374 2829 200a 2020 7361 6d70  -.list()....samp
00000060: 6c65 4c69 7374 2476 616c 7565 7320 3c2d  leList$values.<-
00000070: 2073 6571 2830 2c20 3530 3030 3030 302c  .seq(0,.5000000,
00000080: 2031 2920 0a20 2072 6574 7572 6e28 746f  .1)....return(to
00000090: 5374 7269 6e67 2874 6f4a 534f 4e28 7361  String(toJSON(sa
000000a0: 6d70 6c65 4c69 7374 2929 2920 0a7d 200a  mpleList)))..}..
000000b0: 0a63 616c 6c6d 6528 277b 7d27 2900 0101  .callme('{}')...
000000c0: 0101 0101                                ....

Data Header
00000000: 0100 0100 e065 5102 0000 0000 0000 0000  ....`eQ.........

Invalid response (expecting SEXP)

The following (test-)code reproduces this scenario (please note, jsonlite should be installed @ R):

describe('can receive large json result', () => {
    it('should return correctly', function(done) {
      this.timeout(30000);

      Rio.bufferAndEval(
        "library('RJSONIO') \n" +

        "callme <- function(p) { \n" +
        "  sampleList <- list() \n" +
        "  sampleList$values <- seq(0, 5000000, 1) \n" +
        "  return(toJSON(sampleList)) \n" +
        "} \n",
        {
          entryPoint: 'callme',
          data: {},
          callback: function(err, res) {
            if(err) {
              console.log('error!');
              console.log(err);

              expect(true).to.equal(false);
            } else {
              console.log('result:');
              console.log(res)
              done();
            }
          }
        }
      );
    });
  });

I've uploaded the RServe debug log @ http://www.sanderd.be/tmp/rservelog.gz In this log, the following looks interesting to me (head dump + body dump, RIO seems to fail on head dump and skip body alltogether) - although I have no clue how RServe internally works. (SEXP etc is Chinese to me ;))

getStorageSize(00000000133EFC60,type=16,len=1) getStorageSize(00007FF5FD7B0010,type=9,len=38888910) = 38888920
= 38888928
result storage size = 48611160 bytes
Trying to allocate temporary send buffer of 48611328 bytes.
stored 00000000133EFC60 at 0000000013405048, 38888916 bytes
stored SEXP; length=38888928 (incl. DT_SEXP header)
OUT.sendRespData
HEAD DUMP [16]: 01 00 01 00 e0 65 51 02 00 00 00 00 00 00 00 00  |.....eQ.........
BODY DUMP [38888928]: 4a d8 65 51 02 00 00 00 62 d0 65 51 02 00 00 00 7b 22 76 61 6c 75 65     73 22 3a .....

edit - some additional info: OS: Windows 10, same issue reported @ OS X latest by colleague R: [1] "R version 3.2.2 (2015-08-14)" RServe: Rserve 1.7-3 ()

(Any pointers towards how I could potentially help, shoot!) Thanks, Sander

albertosantini commented 8 years ago

Thanks for the feedback.

Firstly, try to use the package RJSONIO instead of jsonlite.

sanderd commented 8 years ago

Hi @albertosantini

I've updated the code snippet to use RJSONIO. However, not the source of the problem. In R-studio (and logs), the JSON is correctly generated.

It's the transmission of the result, over RServe, to RIO, that's failing.

Thank you!

albertosantini commented 8 years ago

From R console:

> library("RJSONIO")
> 
> callme <- function(p) {
+     sampleList <- list()
+     sampleList$values <- seq(0, 5000000000, 1)
+     return(toJSON(sampleList))
+ }
> 
> x <- callme()
Error in seq.default(0, 5e+09, 1) : 'by' argument is much too small

Indeed no matter RJSONIO or jsonlite Sometimes I get that error with 5000000. With 5000000000 always.

Reference script:

#library("RJSONIO")
library("jsonlite")

callme <- function(p) {
    sampleList <- list()
    # sampleList$values <- seq(0, 5000000, 1)
    sampleList$values <- seq(0, 5000000000, 1)
    # return(toJSON(sampleList))
    return(toString(toJSON(sampleList)))
}

I give a try with RSclient to isolate the problem.

sanderd commented 8 years ago

You could always replace the sampleList-code with a JSON-file you read from disk. This seq-issue is because seq() as a protection built in to prevent memory explosion ;)

As long as the resulting output string is 'large' (~15mb at least?), I get the RServe+RIO error.

Off topic, but I think you figured that out already: jsonlite toJSON returns a 'json' object. Rjsonio returns a 'string'. Hence the toString() when using jsonlite.

albertosantini commented 8 years ago

After hours of investigation, I give you a feedback.

It should be if (responseType & 0x0F !== 10) {.

4a is the response type and we need to take only the first half a. Then there are three bytes for the length of the payload and, eventually, there is 02: it is the type of the packet.

https://github.com/albertosantini/node-rio/blob/master/lib/rio.js#L106

That kind of type is not implemented in rio. Weirdly, when the length of buffer changes, the type of packet is changed.

Firstly I need to merge dev-2.x branch, then I will give another try to this issue.

albertosantini commented 8 years ago

At last the issue is about supporting large packet feature. I added an example, ex7.js.

There is a last glitch before releasing a new version: JSON.parse fails, while json_parse is ok.

I need to investigate with Node.js, because JSON.parse doesn't work.

In the meantime, please, you may use master and give a try.

albertosantini commented 8 years ago

It seems there is a sort of bug (or edge case) in JSON.parse implementation. The array contains the legit number 4.e+05.

I found the offending number using jsonlint. I saved as file the rio response and used it as input for jsonlint.

$ jsonlint ex7.txt                                   
[Error: Parse error on line 1:                       
...5,  4e+05,  4e+05, 4.e+05, 4.0001e+05, 4          
-----------------------^                             
Expecting 'EOF', '}', ',', ']', got 'undefined']     

However in Node.js REPL or Chrome console the statement JSON.parse("[4.e+05]") raises the following message: SyntaxError: Unexpected token e or Uncaught SyntaxError: Unexpected token e.

So I was tempted to fill in issue in Chrome/V8 issue tracker, but, then, I tried it in Firefox: SyntaxError: JSON.parse: unterminated fractional number at line 1 column 4 of the JSON data.

I am afraid that implementation would not be changed very soon and I didn't fill any issue in Chrome, V8 or Firefox.

However you should not get that "bug", because you need to export the numbers with all the figures. Indeed, using scientific notation rounds the value when there is the string conversion. I think you should set options there before converting the R response to string. Otherwise you have the same number representation like ... 5e+06, 5e+06, 5e+06, 5e+06 ].

In a few hours I close the issue, publishing a new release.

sanderd commented 8 years ago

Hi @albertosantini - thanks for your work on this. Sorry for not getting back earlier.

The issue you are describing around the JSON format is indeed not something to be fixed in node-rio, but jsonlite/rjsonio.

Looking forward to the release / will get back to that part of the project real soon.

albertosantini commented 8 years ago

2.1.0 published with large packet support.

Feel free to open new issues. Thanks for the feedback.

I report here a few debugging details. Just as memo. :)

The most import thing is starting a Rserve debug instance. When rio calls Rserve, it displays a few info.

The valuable info is at the end of the debug dump. Indeed you added this info when you opened the issue. Great! If the packet is very large you need to wait a few minutes.

...
OUT.sendRespData
HEAD DUMP [16]: 01 00 01 00 d0 fc 84 03 00 00 00 00 00 00 00 00  |................
BODY DUMP [59047120]: 4a c8 fc 84 03 00 00 00 62 c0 fc 84 03 00 00 00 5b 20 20 20 20 20 20 30 2c 20 20 20 20 20 20 31 2c 20 20 20 20 20 20 32 2c 20 20 20 20 20 20 33 2c 20 20 20 20 20 20 34 2c 20 20 20 20 20 20 35 2c 20 20 20 20 20 20 36 2c 20 20 20 20 20 20 37 2c 20 20 20 20 20 20 38 2c 20 20 20 20 20 20 39 2c 20 20 20 20 20 31 30 2c 20 20 20 20 20 31 31 2c 20 20 20 20 20 31 32 2c 20 20 20 20 20 31 33 2c ...  |J.......b.......[      0,      1,      2,      3,      4,      5,      6,      7,      8,      9,     10,     11,     12,     13,
Releasing temporary sendbuf and restoring old size of 2097152 bytes.
reply sent.
Connection closed by peer.
done.

In large data packet the first 16 bytes of BODY DUMP are the header of the payload: 4a c8 fc 84 03 00 00 00 62 c0 fc 84 03 00 00 00.

yashdosi commented 8 years ago

Was able to reproduce it in 2.4.1. Node: v0.12.5 NPM - 2.11.2

Is this issue still open.

albertosantini commented 8 years ago

@yashdosi without any further detail, it is impossible to reproduce it.

Please, read the thread and when you collect new details, open a new issue.

You may give a look at the example 7. Consider also to update the stack, even if , I think, it is not relevant, but v0.12.5 is really old.