expressjs / multer

Node.js middleware for handling `multipart/form-data`.
MIT License
11.56k stars 1.05k forks source link

Wrong encoding in originalname containing unicode characters #962

Open truemogician opened 3 years ago

truemogician commented 3 years ago

Version : 1.4.2

System : Windows 10

When uploading a file whose name contains unicode character, file.orginalname turns out to be some messy code, indicating something has gone wrong in encoding. Maybe the problem isn't with multer, but I cannot find a way to get the encoding proper. Any explanation or solution will be appreciated ❤️

ongiao commented 3 years ago

Version : 1.4.2 System : Windows 10

Same with your problem. I am using Postman to upload a file with Chinese name,

export const uploadHandler = multer({ storage: iTwinStorage({
  filename: (_req, file, cb) => {
    console.log("filename: ", file.originalname);

    cb(null, file.originalname);
  },
}) }).any();

and file.originalname gives me some garbled code (such as Л�bentley�revit!�-@r7.rvt).

Version : 1.4.2

System : Windows 10

When uploading a file whose name contains unicode character, file.orginalname turns out to be some messy code, indicating something has gone wrong in encoding. Maybe the problem isn't with multer, but I cannot find a way to get the encoding proper. Any explanation or solution will be appreciated ❤️

keliq commented 3 years ago

It's Postman's problem not multer. You can always get correct originalname with curl or axios. For example:

curl --location --request POST 'http://localhost:8080/files' \
--header 'Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6Ik...' \
--form 'files=@"/Users/keliq/Pictures/截图/小白菜.jpeg"'
erguotou520 commented 2 years ago

I'm using curl but still get the wrong encoding...

image
ElderlyBoy commented 1 year ago

You may need this:

req.files[0].originalname = Buffer.from(req.files[0].originalname, 'latin1').toString('utf-8');
YICHUNLIN commented 1 year ago

you can try update multer package from 1.4.2 to ^1.4.5-lts.1, i successed at 2023.3.14

MohamedClio commented 5 months ago

@ElderlyBoy I saw a lot of people just adding this line and they say just add it to multer configuration, can you please tell me exactly where should I add it as I'm fairly new to this? req.files[0].originalname = Buffer.from(req.files[0].originalname, 'latin1').toString('utf-8');

ElderlyBoy commented 5 months ago

@MohamedClio mulit doc Or you can handle the file name separately in each handler:

//example
router.post('/example', (req, res) => {
  req.files[0].originalname = Buffer.from(req.files[0].originalname, 'latin1').toString('utf-8');
  //...your code
})
m1h43l commented 5 months ago

@MohamedClio mulit doc Or you can handle the file name separately in each handler:

//example
router.post('/example', (req, res) => {
  req.files[0].originalname = Buffer.from(req.files[0].originalname, 'latin1').toString('utf-8');
  //...your code
})

Just a note: This code above assumes that originalname is encoded in LATIN1 / ISO-8859-1. But this assumptions may be wrong as many times as it may be right. It is just an assumptions. As long as you don't take the actual encoding into account and act accordingly you may get a wrong result.

Doc999tor commented 5 months ago

Just a note: This code above assumes that originalname is encoded in LATIN1 / ISO-8859-1. But this assumptions may be wrong as many times as it may be right. It is just an assumptions. As long as you don't take the actual encoding into account and act accordingly you may get a wrong result.

It's not entirely correct - multer by default decodes headers values as latin1. It's an old, well-known bug in the last stable version of multer If you expect that the encoding will be utf8, this hack will translate the headers values to utf8

This PR https://github.com/expressjs/multer/pull/1210 provides you a straightforward way to encode the headers as you expect them to be

m1h43l commented 5 months ago

multer uses busboy and if I understand the busboy code correct it more or less supports latin1, utf8 and utf16. But there are dozens of other encodings.

https://github.com/mscdex/busboy/blob/master/lib/utils.js#L384

Doc999tor commented 5 months ago

multer uses busboy and if I understand the busboy code correct it more or less supports latin1, utf8 and utf16

Multer has a bug, so it uses busboy incorrectly Instead of supporting these 3 encodings, multer mistakenly converts all the headers value to latin1