expressjs / body-parser

Node.js body parsing middleware
MIT License
5.44k stars 728 forks source link

iso-8859-1/windows-1252 support? #194

Open papandreou opened 8 years ago

papandreou commented 8 years ago

Hi,

A while ago I developed a web app to replace an old cgi-bin script from the nineties, which means that it had to assume that POST requests with a Content-Type of application/x-www-urlencoded are in iso-8859-1 (and also support "smart quotes" etc., aka. windows-1252). At the same time it had to be possible to switch to utf-8 mode.

I accomplished that by adding support for a defaultCharset option, which can either be utf-8 or iso-8859-1, and for explicitly switching to utf-8 mode by adding utf8=%E2%9C%93 (utf8=✓ as percent-encoded utf-8 octets) to the query string, a trick that I've seen others use as well.

Here are the diffs between my forks of body-parser and the qs module:

https://github.com/expressjs/body-parser/compare/master...papandreou:iso-8859-1 https://github.com/ljharb/qs/compare/master...papandreou:interpretNumericEntities

Would anyone be interested in this work? If so, I can clean it up and open PRs.

Xtrem65 commented 6 years ago

@dougwilson Any update on this ?

I'm interested in this feature but as @papandreou 's branch is way behind the official master. Seems like it's going to be hard to pull...

Is there any proper other way to handle ISO-8859-1 encoded data ?

dougwilson commented 6 years ago

Hi @xtrem65 I haven't even commented on here, let alone done any work on it :) I don't have any updates.

papandreou commented 6 years ago

@Xtrem65, I guess the only update is that you're expressing interest, which is also a good thing :)

If @dougwilson and @ljharb think the work looks good and they want to adopt this feature, I'd be willing to bring the branches up to date.

papandreou commented 6 years ago

@Xtrem65, have you tried switching to my body-parser-papandreou fork to see if it solves your problem? It should still work with current versions of express.

ljharb commented 6 years ago

I’d be happy to review a PR.

papandreou commented 6 years ago

@ljharb, that sounds great. Before I try to rebase it on qs master, does this approach look good? https://github.com/papandreou/qs/compare/9250c4cda5102fcf72441445816e6d311fc6813d...interpretNumericEntities

ljharb commented 6 years ago

I'd need to review it when rebased :-) it looks like a good direction tho.

papandreou commented 6 years ago

Rebased branch: https://github.com/papandreou/body-parser/tree/feature/iso-8859-1/take2

Requires https://github.com/ljharb/qs/pull/268 to be npm linked in.

Xtrem65 commented 6 years ago

Wow guys ! so much reactivity :D

I'd be happy to help if needed

papandreou commented 6 years ago

@Xtrem65, you can help by trying it out :)

  1. Make a clone of https://github.com/papandreou/qs
  2. Check out the feature/iso8859-1 branch
  3. npm link
  4. Make a clone of https://github.com/papandreou/body-parser
  5. Check out the feature/iso-8859-1/take2 branch
  6. npm link qs
  7. npm link
  8. cd to your project
  9. npm link body-parser

... Then see if you can solve your problem. Depending on the requests that you need to support, you might have to specify the defaultCharset option.

jonchurch commented 6 months ago

Closing as niche interest. Looks like some code was written to get folks who want this started. If someone wants to revive this and open a PR, we can get the defibrillator out on this issue.

ljharb commented 6 months ago

indeed, the qs code for this is in v6.6.0+, so only body-parser would need changes.

papandreou commented 6 months ago

@jonchurch, it's implemented by this PR, which has been sitting there for almost 5 years, waiting for a new major release that ditches support for node.js 0.10 and below :)

jonchurch commented 6 months ago

apologies @papandreou I missed that PR link while triaging on mobile. I thought a PR never came through, will open this back up as there's a PR attached.

Getting up to speed on this request, as I have never had to deal with this problem. But Im all for better supporting common bodies on the internet via config options.

Reopening