ApelegHQ / ts-multipart-parser

TypeScript streaming parser for MIME multipart messages
ISC License
6 stars 1 forks source link

Does not work with utf-8 headers #15

Open brupxxxlgroup opened 1 week ago

brupxxxlgroup commented 1 week ago

Given the following simple HTTP multipart/form-data upload

----------------------------695807392213189643524668
Content-Disposition: form-data; name="äasdasd"; filename="СЕРГЕЕНКО.rtf"; filename*=UTF-8''%D0%A1%D0%95%D0%A0%D0%93%D0%95%D0%95%D0%9D%D0%9A%D0%9E.rtf
Content-Type: application/rtf

{\rtf1}
----------------------------695807392213189643524668--

the lib will fail with

TypeError: Cannot convert argument to a ByteString because the character at index 38 has a value of 1057 which is greater than 255.
    at webidl.converters.ByteString (node:internal/deps/undici/undici:3661:17)
    at node:internal/deps/undici/undici:3550:20
    at Object.sequence> (node:internal/deps/undici/undici:3550:20)
    at webidl.converters.HeadersInit (node:internal/deps/undici/undici:8693:69)
    at new Headers (node:internal/deps/undici/undici:8523:36)
    at z (file:///C:/Repos/api.accountbalance.files.its/node_modules/@exact-realty/multipart-parser/dist/index.mjs:5:330)
    at v (file:///C:/Repos/api.accountbalance.files.its/node_modules/@exact-realty/multipart-parser/dist/index.mjs:6:903)

What can be done about it? Thank you so much for your help.

brupxxxlgroup commented 1 week ago

Can multipart parts have UTF-8 headers?

brupxxxlgroup commented 1 week ago

Even if i change the name= value to an ASCII one it fails.

brupxxxlgroup commented 1 week ago

It seems the issue is here

https://github.com/ApelegHQ/ts-multipart-parser/blob/598fb44db61bd666d019688d91ab0478d27790c2/src/parseMessage.ts#L62

corrideat commented 1 week ago

Thank you for the report and the documentation. I'll take a look and see if / how this can be resolved.

It is my understanding that UTF-8 is generally not allowed in headers, although the link you provide seems to show that it's ambiguous.

If the issue is with the Headers constructor, then I'm not sure what the solution would be (I'd rather not implement a custom headers handler, if at all avoidable).

What can be done about it? Thank you so much for your help.

Well, what can be done right now is not using non-ASCII values, but obviously that's not ideal. I'll investigate this and try to find a longer term solution.

brupxxxlgroup commented 1 week ago

I think your library does exactly what it should do. The client is somehow responsible to encode those values in a proper way.

There is https://datatracker.ietf.org/doc/html/rfc2231 and https://datatracker.ietf.org/doc/html/rfc6266#section-4.3

It seems many clients do not behave properly.

UTF-8 per se is not allowed in headers.

There is also this nice test page with all kind of combinations

http://test.greenbytes.de/tech/tc2231/#encoding-2231-fb

I am not sure if the lib should fix wrong client behaviour but it would be nice if we could some preprocess the header and hand it back to the lib. So as a consumer i could try to fix the header and return a correct value.

Another way would be to convert all the headers values to latin1 and as a consumer i need to convert it back to utf-8.

I am also not sure what is the best way.