expressjs / multer

Node.js middleware for handling `multipart/form-data`.
MIT License
11.61k stars 1.06k forks source link

Issue with UTF-8 characters in filename #1104

Open CleyFaye opened 2 years ago

CleyFaye commented 2 years ago

Hi,

I found recently that something changed regarding the handling of filename containing utf-8 characters; they seem to be passed as-is, which was not the case before.

After investigating a bit I could reproduce the issue with the minimal code in https://github.com/CleyFaye/test-multer

I found that the browser side just pass the name as-is in the "filename" part of the header. I've seen another issue related to using "filename*", but there is two problem with that: the browser's formdata does not use this, and RFC7578 actually says it should not be used.

What would be the proper way to handle this? Obviously it is possible, server side, to convert the content of originalname by putting all characters as bytes in an array then interpreting it as an utf-8 string (it does work), but since I never had this issue with older versions, I suspect something changed in the way multer handles this.

CleyFaye commented 2 years ago

The small test provided returned the expected filename with multer@1.4.4, and changed with multer@1.4.4-lts.1.

dvantage commented 2 years ago

Same problem after update from 1.4.4 on 1.4.5-lts.1

dvantage commented 2 years ago

Multer has nothing to do with it, Busboy has changed something. https://github.com/mscdex/busboy/issues/20

This solved my problem:

file.originalname = Buffer.from(file.originalname, 'latin1').toString('utf8')
CleyFaye commented 2 years ago

Multer has something to do about this, since it definitely changed behavior in an arguably incompatible way in what looks like a patch revision.

What to do however I'm not sure; either way would be fine (interpreting the utf-8 to be consistent with previous behavior or passing the raw string to not make assumptions about encoding), but I believe this kind of change in a patch is troublesome to users.

ghost commented 2 years ago

Multer has nothing to do with it, Busboy has changed something. mscdex/busboy#20

This solved my problem:

file.originalname = Buffer.from(file.originalname, 'latin1').toString('utf8')

God bless you.

I've managed to make a bodge in my app

    const fileName = Buffer.from(el.originalname, 'latin1').toString('utf8');

because in my case invalid £$ file.txt was becoming invalid £$ file.txt. Ideally we have this fixed when busboy is fixing that end. Thanks a lot.

sominlee74 commented 2 years ago

HI, I faced same issue with the filename in Korean. I found out that the issue is relevant to "busboy', especially config property of "defParanCharset." The default value of that property is 'latin1', which means some parameters like non-latin filename in input-form is misdecoded on nodejs side without proper configuration. However, in the "multer" we don't have option to change the config properties of busboy.

I hope the line 28 in '/lib/make-middleware.js' will be changed such as: busboy = Busboy({ headers: req.headers, limits: limits, preservePath: preservePath, defParamCharset: 'utf8' })

At least, some way to configure busboy through multer module would be required.

bf commented 2 years ago

This issue is still relevant. Multer should not deviate from utf-8 default. An multer option should be created so that we can influence busboy defParamCharset.

jhpung commented 1 year ago

I published a multer-utf8 package on npm that read files as utf8 charset by default.

https://www.npmjs.com/package/multer-utf8

lujijiang commented 1 year ago

The problem still exists, please fix it quickly

LinusU commented 1 year ago

Just to clarify, in Multer 1.4.4 the name was parsed as utf-8, and in Multer 1.4.5-lts.1 it's parsed as latin1?

In that case it seems straight forward to add defParamCharset: 'utf8' so that the new version behaves the same as the previous...

Doc999tor commented 1 year ago
  1. Tried both defParamCharset and defCharset - has no effect
    multer({
    storage,
    defParamCharset: 'utf8',
    defCharset: 'utf8',
    })
  2. As far as I see, only selected options are passed from the config to busboy
    https://github.com/expressjs/multer/blob/25794553989a674f4998b32a061dfc9287b23188/index.js#LL11C1-L23C2
ngovanduy0908 commented 1 year ago

@CleyFaye @dvantage thank you very much

TiuBen commented 1 year ago

why the default Postman is right?

starnayuta commented 1 year ago

@TiuBen

Postman uses “filename*”, so filename problems do not occur.
But browsers do not use it.

see https://github.com/expressjs/multer/issues/1104#issue-1266094642

I've seen another issue related to using "filename*", but there is two problem with that: the browser's formdata does not use this

TiuBen commented 1 year ago

where can I use filename*

stouch commented 6 months ago

Is this solved ? I still got this issue and Buffer.from(file.originalname, 'latin1').toString('utf8') solves it...

Braste commented 4 months ago

This is definitely NOT fixed! For TWO years now!!

Doc999tor commented 4 months ago

There is an open PR fixing all UTF8 issues in headers - https://github.com/expressjs/multer/pull/1210

dbs-tuan commented 2 months ago

[Client] const formData = new FormData();

formData.append('fileName', encodeURI(dto.file.name))

call axios.post(....)

[Backend] const fileName = decodeURI(req.file.originalname);

it works

CleyFaye commented 2 months ago

That is not an option when sending an actual form, which I assume is a common case.