expressjs / body-parser

Node.js body parsing middleware
MIT License
5.44k stars 728 forks source link

use non blocking json parser #132

Open mderazon opened 9 years ago

mderazon commented 9 years ago

Is there an option to use a non blocking (streaming?) json parser ?

The problem with JSON.parse is that it's blocking the event loop and for big json parsing it can be a problem

dougwilson commented 9 years ago

I'm not against it at all :) Typically everyone uses the default limit value in the parsers here and JSON.parse for that limit takes well under 1ms to parse, so it's typically not an issue.

If you want to put together a PR to implement this (and perhaps get urlencoded parser to be streaming too in that PR) that would be awesome, and the best way to move this forward :)

mderazon commented 9 years ago

I've kind of ditched the idea of parsing big json from an api and went with a different approach. That said, I hope I can find a time to work on this.

In the meantime, maybe we can discuss it a little here for the sake of having some reference

  1. Any streaming json lib you like in particularly ?
  2. Overhead of using streams here for all json parsing will probably mean that small parsing jobs performance will take a hit, as JSON.parse is faster for most cases
mnpenner commented 9 years ago
  1. If they do take a performance hit, perhaps you can Content-Length to choose a parser. Small payloads can continue to use JSON.parse.
dougwilson commented 8 years ago

Any streaming json lib you like in particularly ?

I've never used any, so I have no preference. If you can find one that is well-supported, that would be best :)

Overhead of using streams here for all json parsing will probably mean that small parsing jobs performance will take a hit, as JSON.parse is faster for most cases

Yea, it would probably be slower, what we need to do would need to be determined by benchmarks. Also, be careful, as just keying a switch off Content-Length can simply penalize slow servers for being slow, when they could have benefited from a streaming parser for even small bodies.

ghassanmas commented 5 years ago

I have done research on this topic and it seems this topic is already discussed on node https://github.com/nodejs/node/issues/2031#issuecomment-114944005. To my understanding what are suggesting on doing is not feasible.

Did I miss something?

wesleytodd commented 5 years ago

Good find @ghassanmas, that comment gets to the point well, and IMO probably a good indication that this module should probably not take an opinion on this directly. Offering an interface to override the JSON parse implementation (and then supporting async versions of the parse) is a good idea, but directly implementing a specific one (even if it is the one recommended in that comment) is probably not a good long term solution.

gireeshpunathil commented 4 years ago

I would like to see what is the recommended direction here:

so the second point necessitates the first, and the first is already discarded as an option.

So? should we conclude and close this as wontfix, and revisit later when the workload nature is drastically shifted and an async-json becomes an imminent requirement?

fed135 commented 4 years ago

Just a bit of a left field solution: I've stumbled upon https://github.com/bjouhier/i-json, which can handle json chunks directly from a stream, instead of concatenating the text and then deciding what to do with it. It's not an easy lift, but could be pretty interesting. I think I'll do a small POC to check feasability and performance overhead.

gireeshpunathil commented 4 years ago

just so that we are here - if you are going to do a POC, can I request to test with this yieldable-json as well?

fed135 commented 4 years ago

Update on my POC:

Tests were performed using ApacheBench against a hello-world basic API on Node 8 (due to the i-json package), creating 100k requests with json bodies of 10kb and 100kb, at a concurrency of 100 I'm not sure if that's an indication that problems only happen on extremely large JSON bodies- which to me is a rare edge-case.

I did notice a possible area for improvement though, body parsing as a middleware typically happens before routing, which means that POST requests which would result in a 404 still go through body-parsing. @wesleytodd I'm not sure if that's significant.

wesleytodd commented 4 years ago

body parsing as a middleware typically happens before routing, which means that POST requests which would result in a 404 still go through body-parsing.

If you are writing something where this is an issue you can always only call body-parser after a route has been matched. It requires adding the parser for every route, which is a downside. For example:

app.get('/', bodyParser.json(), routeHandler)

I have some ideas, and there are other implementations (for example Restify), where this is mroe directly supported, but today that is the only clear path to avoiding it.

dougwilson commented 4 years ago

Correct, and the README explicitly lists adding the parser to your routes specifically as the recommend method. I have done that on every app I have built and never had a problem. Using globals actually always ends up being an issue long term, as inevitably I need some routes to accept larger payloads than others, some routes I want authentication validated to happen prior to body parsing (to 401 as early as possible vs need to parse on an anonymous or login route) and typically I do not want multipart-only routes like file uploads to even attempt to parse a json payload, so different routes end up with different types of body parsers. I'm not sure any real-world app would function well in the long run with a bunch of global middleware piled at the top, as then nothing would really be configurable on a per-route basis.

P.S. these two comments are marked as off-topic, as the topic of the issue is about a non blocking json parser, not how to organize the parsing middleware :)

renatoalencar commented 4 years ago

I was reading Express docs and found this, I think this is at least a great discussion around interesting topic, so let me try to add my small contribution.

First of all, we need to decide which approach or strategy to implement a non-blocking parser. My ideas are:

Limit the amount of time the parser can use

Start the parser stopping it after a short period of time, keep the state for the next cycle of parsing. Repeat this until the stream is completely parsed.

Parse it in a different thread or process

Starting a different thread or process, parsing the stream and then resolving the value back to the original caller. A single process with its own event loop can be used in this case, so no one needs to deal with several threads running at the same time.

The actual parser

The parser could be built extending the actual Node.js JSON parser, if this is possible no one needs to build a parser from scratch.

I personally think that the incremental option is more viable, since no one needs to deal with a second running thread or process and IPC. The strategy now should be building PoCs of some different approaches and comparing them against each other, both in performance and maintainability.

rawpixel-vincent commented 1 year ago

I've published https://github.com/rawpixel-vincent/yieldable-json-body-parser that uses https://github.com/ibmruntimes/yieldable-json and is based off 2.x at https://github.com/expressjs/body-parser/pull/66

can be implemented as drop in replacement of bodyParser.json() middleware

this is a fresh so there's possibly issues that haven't been found yet, use with caution - suggestions welcome