PuruVJ commented 3 months ago

Changes

Replaces qs dependency with neoqs. https://github.com/puruvj/neoqs

Result:

CleanShot 2024-08-07 at 18 38 49

Graph:

CleanShot 2024-08-07 at 18 40 01

The entire branch of qs will be replaced with one node neoqs.

Support: Node 8+

wojtekmaj commented 3 months ago

Blocked by #66. Currently, Node.js 0.8 is supported.

wesleytodd commented 3 months ago

Can you explain the reason for this change? Size on disk after install has never been a thing this project set's as a goal. Stability is though. If you have more reasons to want this than .4mb then please do share them.

PuruVJ commented 3 months ago

Hi! Thanks for the quick reply!

There is no huge win with this. The PR is quite insignificant in the grand scheme of things. With that in mind, I'll list all the benefits this approach can have, within the small scope:

Every end user downloading body-parser, directly or indirectly, will have faster download times. Probably by just an extra second or two. Overall insignificant, however it can increase the overall DX in eastern nations with poorer network conditions(Like mine, for example)
qs's bundle size is 12.1kb, neoqs is 4.4KB.
And hence, there is less code to be executed, which(in this case means) slightly better overall perf. Insignificant yes, but possible.
Reduced bandwidth on all users and overall internet infra is rarely a bad thing.

My goals with neoqs: Always maintain backward compatibility with qs. This means all qs tests must pass, which is made sure of. Any updates to qs will be downstreamed as-is to neoqs, alongside source code and tests. And ofc to keep it 0-dependency forever. neoqs will never be drastically rewritten for better perf, as that can have an effect on existing apps even with all tests passing. Backward compatibility is of the utmost importance here over anything else.

That said, a random developer opening a PR to a famous package and putting his own project there can look suspicious, and I admit that. Hence I'm more than willing to close this PR and open one with other alternatives, like picoquery, qs-esm or fast-querystring. Anything to make the ecosystem faster(even if slightly 😄) helps.

Let me know your thoughts! Have a good day

wesleytodd commented 3 months ago

Hi! Thanks for the quick reply!

No problem! Thanks also for the quick reply and the great details.

I'll list all the benefits this approach can have, within the small scope:

This is great context to add. I totally hear you on these aspects.

I would love to have someone bring some proof to these claims in the form of a well studied benchmark (as I requested of @wojtekmaj in https://github.com/jshttp/content-type/pull/24#issuecomment-2231921373). I have worked on many large apps and support many teams at work with large monorepos, from that experience I have learned that micro optimizations are almost never worth the effort. If folks can show data that proves out the improvement I would happily start considering changes like this.

a random developer opening a PR to a famous package and putting his own project there can look suspicious

Not suspicious at all! How do you think all this got built in the first place? The entire OSS we have today is because "random developers" showed up to do great work. I am a random developer lol. We all are. I have a high degree of trust in people and the process we have to try and catch malicious actions before they ship to end application owners, so don't feel the need to do any additional work to use someone elses package since that is not the reason I am hesitant to jump onto this change.

PuruVJ commented 3 months ago

Thanks a lot the kind response 😄

I made a simple benchmark:

hyperfine --prepare 'pnpm store prune && rm -rf node_modules package-lock.json' 'pnpm install --force neoqs' 'pnpm install --force qs' --runs 10

Result:

CleanShot 2024-08-07 at 23 02 08

It seems to float between 3-8% faster than qs, but in all runs it's faster than qs. My internet speed is:

CleanShot 2024-08-07 at 23 05 16

(Which is better than the entire daytime 😅)

Lemme know if you'd like me to tweak the benchmark or anything.

benmccann commented 3 months ago

I did a quick benchmark on this out of curiosity on my laptop doing a clean npm install (i.e. no node_modules or lockfile present). Installing neoqs was about 100ms faster.

qs 1.209 s 0.769 s 0.938 s Average: 0.972 s

neoqs 1.104 s 0.719 s 0.759 s Average: .861 s

(For comparison, it took an average of 1.285s over three installs to install the deep-equal library being swapped out in the other PR and 0.917s to install the dequal alternative meaning you could save about 350ms there without a cache. I'm on a gigabit fiberoptic line, so it could be different for folks that don't have the same level of connectivity)

I don't know whether that would be considered enough to make a difference in the outcome here since it's a one-time operation and you can cache package installs.

There's also a security aspect (speaking generally and not necessarily about this PR) where fewer packages means there are fewer individuals with access to the supply chain. It's rare that there's a malicious maintainer or that a non-malicious maintainer is hacked, but it has happened and it's nice to reduce that risk when possible.

The main thing for me though (again generally speaking) is that packages with fewer dependencies tend to cause me less difficulty and are easier to fix when I need to. I run into issues all the time where after fixing a package I need to bump the version in its consumer and so on to be able to benefit from it and that means I have to PR three or four different packages, which can take months and sometimes never happens if one of the intermediary packages is no longer maintained. And just finding where the bug is takes longer in the first place when the code is split across more packages. Even for small battle tested libraries that are unlikely to be buggy, I've hit issues. E.g. I'm trying to fix some warnings caused by packages being deprecated right now in a couple of projects I maintain and it's been a months-long ordeal that's still moving slowly towards a resolution. Those dependencies didn't necessarily have to be deprecated, but it's far down the chain and I don't have any control over what those authors have decided to do.

wesleytodd commented 3 months ago

I don't have time right now to fully reply again, but I wanted to clarify something which I might have been less clear on. I don't think a micro benchmark of installing qs vs neoqs is helpful. It is clear you can get a perf win in this way, but it is much less clear what this looks like in end user repos. Most of the installs for body-parser come from express, so maybe a benchmark of installing express would be better? Even that though, I would almost prefer real apps be the benchmark since even just installing express is a synthetic proxy for a real end user impact.

joshuajaco commented 3 months ago

I don't have time right now to fully reply again, but I wanted to clarify something which I might have been less clear on. I don't think a micro benchmark of installing qs vs neoqs is helpful. It is clear you can get a perf win in this way, but it is much less clear what this looks like in end user repos. Most of the installs for body-parser come from express, so maybe a benchmark of installing express would be better? Even that though, I would almost prefer real apps be the benchmark since even just installing express is a synthetic proxy for a real end user impact.

I don't really understand the reasoning to want a real app benchmark. I highly doubt this will have a significant impact on any app. But I don't think that is relevant at all. This PR reduces the package size, which will speed up installs by some small amount. So if you no longer plan on supporting older node versions (as mentioned in https://github.com/expressjs/body-parser/pull/66), there would be no reason not to swap.

benmccann commented 3 months ago

If the desire is simply to benchmark body-parser inside a large project containing other dependencies, I'd be happy to do that. If it's a requirement that express be part of the benchmark I'm not sure it's worth doing as it pretty clearly won't have an impact. If express doesn't make the same move, you won't see a savings from this PR as you'll still need to download qs and all its dependencies anyway.

I just found and read through the related discussion in the express repo. It sounds like the proposed solution there would be to make the query string parser pluggable in v6 and that's when I imagine you would see an improvement from this PR. Perhaps making the parser pluggable is a solution that should be explored here as well?

wesleytodd commented 3 months ago

I highly doubt this will have a significant impact on any app.

If it's a requirement that express be part of the benchmark I'm not sure it's worth doing as it pretty clearly won't have an impact.

Yes this is my point in asking for a benchmark to prove this improves end user install times. Feel free to also do a benchmark where express or other libraries in this most common install path also update to neoqs (that way we could see the end vision y'all have). I personally do not care to take on the work or risk of swapping a dependency for a functionally equivalent one unless it is directly improving the end user experience. I hope that makes sense.

PuruVJ commented 3 months ago

I published @puruvj/body-parser and @puruvj/express, both using neoqs/legacy. Here are the benchmarks of installing express in one directory and @puruvj/express in other one.

Command:

hyperfine --prepare 'pnpm store prune && rm -rf node_modules pnpm-lock.yaml' 'pnpm store prune && cd neoqs && pnpm install --force @puruvj/express' 'pnpm store prune && cd qs && pnpm install --force express' --runs 10

CleanShot 2024-08-08 at 21 40 52

Varies all the way from 1% to 29% more.

Project is quite simple

CleanShot 2024-08-08 at 21 44 42

I don't think there's a lot of merit in benchmarking real world apps installing this version of express.

PuruVJ commented 3 months ago

And while we're at it, the dependency graph comparison:

CleanShot 2024-08-08 at 21 48 53

13 less dependency in the version with neoqs

wesleytodd commented 3 months ago

Thank you so much for doing this! This is exactly the kind of thing I wanted to see. Just starting my work day, but I can likely try to read these results in detail later today.

joshuajaco commented 3 months ago

Yes this is my point in asking for a benchmark to prove this improves end user install times. Feel free to also do a benchmark where express or other libraries in this most common install path also update to neoqs (that way we could see the end vision y'all have). I personally do not care to take on the work or risk of swapping a dependency for a functionally equivalent one unless it is directly improving the end user experience. I hope that makes sense.

That does makes sense, and I understand where you're coming from. I'm just here to advocate for a leaner, faster, and overall more modern JavaScript/Node ecosystem. And I don't think this is possible without a broader community effort. This PR may not have a significant impact on its own, nor will the next one, but cumulatively, they will make a difference. I fear that blocking PRs like this one for being insignificant individually misses the broader picture.

wojtekmaj commented 3 months ago

I personally do not care to take on the work or risk of swapping a dependency for a functionally equivalent

That's the key. You're not swapping a dependency. You're swapping 14 dependencies for one. This discussion drifted towards performance, which although important, is not really the point of this PR. The point is, in my opinion, to stop dependency madness, to gain back control over the dependencies we use, to only include these ones we actually use and need, to limit surface of a potential ecosystem attack.

wesleytodd commented 3 months ago

I think it is likely we disagree on what "to stop dependency madness, to gain back control over the dependencies we use, to only include these ones we actually use and need, to limit surface of a potential ecosystem attack" means.

stop dependency madness

This is not a form of dep madness, this is correctly leveraging the system as designed. There are obviously optimizations available within that system which rely on reducing transitive deps, but calling it "madness" is over-stating the problem and is a tactic used to cause FUD. I would appreciate in the future if we could use language not designed to alienate alternate viewpoints please.

to gain back control over the dependencies we use

You do have that, by design and in real terms.

You can help maintain express and the packages this project owns enough that we trust you will be around to answer the user issues and help with bug fixes introduced by changes like these. This is an open group and the folks who show up since recommitting to the governance have all had great input and helped form the direction of decision making here. You can be involved in that work!

If you are not able to do that there are alternative tools which may not include these dependencies you do not need. And if that does not exist you can fork the necessary components to achieve your goals.

On a more philosophical note, I would caution against trying to "control" the OSS ecosystem. It is not really the "goal". If you need control, then you need to give up some of the best things about distributed decision making and the progress that comes with a large global network of developers working on different ideas.

to only include these ones we actually use and need

You may not need these, but they were added because someone does. The express project historically has committed to stability, and right now we are attempting to keep that for v4 and v5 lines. This means making some conservative decisions, but IMO that is what the ecosystem has come to expect from express and I have little interest in breaking too many expectations without good reason. This is nearly certain to change in the future once we get past this initial revitalization effort.

to limit surface of a potential ecosystem attack

Switching a dependency is actually introducing new risk, which is an increase in potential attack surface. As I said above, I am not concerned that anyone in this thread is a bad actor, but just to be clear on this: if someone does turn into a bad actor or is targeted in an attack and their account compromised, having the fewest trusted individual people in the dependency tree is the only way to reduce risk here.

AFAIK this package and all of it's transitives have the following:

1 highly trusted person in the ecosystem who we have regularly worked with for years on this stuff
1 historically prolific person who has been a good actor for many years
1 person I did not recognize (again, not that that is a bad indication, it is good this ecosystem is so big you cannot recognize everyone's profile)

To me this is very acceptable risk. While I do not have any concern that someone in this thread is a bad actor, but introducing a new person is more risky, especially when that new person is not someone who has worked closely with this project in the past (again, we welcome you working to become a trusted partner along side us).

Honestly, if we are going to work on improving ecosystem attack surface then we have much better targets than this package and it's transitives. If this is something you care about, please attend some of our security WG's or other ecosystem efforts to improve this. Happy to provide links if this is something you are interested in working on.

In closing, if we removed a dependency because the platform (node.js) took over for a userland concern, or the language takes over a node.js concern, those are great changes. I am happy to see things like what https://github.com/expressjs/express/pull/5731 does. I would LOVE if y'all took this great energy and put it toward the goal of having great query string parsing in the platform. Or if maybe there is opportunity to improve the existing qs library to make it more suitable for your needs, that way everyone wins.

In the mean time, I think we should keep this discussion alive until we can decide if this is a direction we could pursue in the future in all the express libraries. But unless one of my fellow members of the project stands up to champion this, I do not see us taking this action for the upcoming 5v release so it will be no sooner than a few months that we would circle back on this.

43081j commented 3 months ago

Just my opinion on this:

I think express and body-parser should actually use URLSearchParams under the hood for query string parsing, or node's built in querystring module (which supports arrays I think?)

I would then expose an option to let the user pass in a custom query string parser.

In the docs I would recommend picoquery (much faster and lighter than qs), or neoqs (lighter than qs, same API). qs is very slow in comparison to most modern alternatives, I would no longer recommend it for that reason

wesleytodd commented 3 months ago

@43081j have you looked at what express does support? You might want to check out the docs. I am going to mark these two comments as off topic.

43081j commented 3 months ago

Indeed I have. What do you think I missed? The comment certainly was not off topic, so it would be good to understand what you think was missed.

Any change here would be breaking unless you use neoqs (since that tries to provide exactly the same functionality as qs). So my suggestion was assuming one day we would be open to such a breaking change, to allow us to make it pluggable as suggested in express too

On the topic of moving off qs (which is the topic of this pr), I would suggest this and express move off all query parsing libraries and let you provide one instead (defaulting to node's built in library). That is what my last comment was trying to suggest

wesleytodd commented 3 months ago

Sorry, I thought you were one of these people who drop in to give an opinion and then never come back, it happens in like half of these longs thread issues. If that was a poor assessment I am sorry.

I think express and body-parser should actually use URLSearchParams under the hood for query string parsing, or node's built in querystring module (which supports arrays I think?)

Your runtime needs are all already covered by the express api. Express v5 will land a PR to use URLSearchParams for query strings, we support the querystring module both for query and in body-parser. If you want to use URLSearchParams in this package then please open that PR, but that is unrelated to this PR.

I would then expose an option to let the user pass in a custom query string parser.

This could also be a separate PR here. But just to clarify, when I replied I was thinking you meant parsing query strings, not url encoded bodies. And for that use case, express already allows you to pass your own parser.

In the docs I would recommend

Our docs are not going to recommend picoquery or any other when we use qs, so it seemed like an off topic comment.

EDIT: so can I mark all these as off topic with your approval so we can keep focused on the proposal to replace the existing module?

43081j commented 3 months ago

Sorry, to be clear this package currently allows you to specify "extended" mode which will pull qs in to support nested syntax

My suggestion is that this option is false or you must pass a function in. Currently, we pull qs in even if you don't use it. Such a change would put that burden on the user instead (to install a nested query string library and pass it in).

I agree with the change this pr is doing if we don't want to do the above

expressjs / body-parser

feat: qs -> neoqs #530

Changes