IndySockets / Indy

Indy - Internet Direct
https://www.indyproject.org
455 stars 155 forks source link

TIdMultiPartFormDataStream.AddFile complications #174

Closed regs01 closed 1 year ago

regs01 commented 7 years ago

Faced some compilations with AddFile. Latin filenames having no problems, but non-Latin gets encrypted.

There is a filename - "Новый текстовый документ.txt" By default i'm getting "X2cog1MXL09TP19nKIMTPy9XNxc7ULnR4dA==?=" which i fail to decode at all.

If I set

  with MultiData.AddFile ('file', FileName) do
  begin
    HeaderEncoding := 'B';
    HeaderCharSet := 'UTF-8';
  end;

Then I get "=?UTF-8?B?0J3QvtCy0YvQuSDRgtC10LrRgdGC0L7QstGL0Lkg0LTQvtC60YPQvNC10L3R?= =?UTF-8?B?gi50eHQ=?=" Could be good, but iconv_mime_decode fails to decode it saying: Detected an incomplete multibyte character in input string. mb_decode_mimeheader shows no errors, but fails to decode with result "Новый текстовый докумен??.txt" Both doesn't seem to like the cut.

At last if I set

  with MultiData.AddFile ('file', FileName) do
  begin
    HeaderEncoding := 'Q';
    HeaderCharSet := 'UTF-8';
  end;

Then I get "=?UTF-8?Q?=D0=9D=D0=BE=D0=B2=D1=8B=D0=B9=D1=82=D0=B5=D0=BA=D1=81=D1=82?= =?UTF-8?Q?=D0=BE=D0=B2=D1=8B=D0=B9=D0=B4=D0=BE=D0=BA=D1=83=D0=BC=D0=B5?= =?UTF-8?Q?=D0=BD=D1=82.txt?=" Now this time iconv_mime_decode sees some success. Yet mb_decode_mimeheader doesn't.

Is there any preferred way for non-latin file names? Or is there a way to send plain text UTF-8 instead?

rlebeau commented 7 years ago

That is not encryption. The filenames are simply being encoded accordingly to RFC 2047, "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text".

On systems where ISO-8859-1 is the default charset, TIdFormDataField encodes the filename using MIME's quoted-printable format. On other systems, filenames are encoded using MIME's base64 format instead (and in Unicode versions of Delphi/FreePascal, the filenames are first encoded using UTF-8 before than encoding the resulting octets in base64).

"Новый текстовый документ.txt" is using Russian characters. On a Russian system, TIdFormDataField uses KOI8-R to encode the filename before base64 encoding it. That produces the following base64 output:

"7s/X2cog1MXL09TP19nKIMTPy9XNxc7ULnR4dA=="

The complete MIME header should thus look like this:

Content-Disposition: form-data; name="file"; filename="=?KOI8-R?B?7s/X2cog1MXL09TP19nKIMTPy9XNxc7ULnR4dA==?="

If iconv_mime_decode() cannot decode that filename, then that is an issue in PHP, not in Indy.

UTF-8 base64 is the best option, but as you noticed, while Indy does produce the correct base64 output, its MIME encoder is not splitting the encoded text correctly due to mishandling of multi-byte characters. That is a known issue that has not been fixed yet: https://github.com/IndySockets/Indy/issues/157

regs01 commented 7 years ago

Yeah, that's what i mean. Ah, that's seem the problem. I'm getting solely X2cog1MXL09TP19nKIMTPy9XNxc7ULnR4dA==?=, not 7s/X2cog1MXL09TP19nKIMTPy9XNxc7ULnR4dA== and not even =?KOI8-R?B?7s/X2cog1MXL09TP19nKIMTPy9XNxc7ULnR4dA==?=

Anyway, KOI8-R has passed away on the web like 15 years ago. It's only popular in emails and primary by Microsoft apps (Outlook, UWP Mail etc). They still fail to let the past go away. UWP Mail still even can't parse UTF-8 base64 headers. But that's another story.

Still, Is it possible to use plain text utf-8 in headers like browsers do? It's just more universal to work with, as it's... universal)

rlebeau commented 7 years ago

Using the current SVN rev, I cannot reproduce what you claim. This is the exact header that TIdFormDataField.FormatHeader() produces for me when using HeaderCharset='KOI8-R' and HeaderEncoding='B':

Content-Disposition: form-data; name="file"; filename="=?KOI8-R?B?7s/X2cog1MXL09TP19nKIMTPy9XNxc7ULnR4dA==?="

If you want to send raw UTF-8 octets without encoding them to base64, set HeaderCharset='UTF-8' and HeaderEncoding='8', which would then produce this header:

Content-Disposition: form-data; name="file"; filename="Новый текстовый документ.txt"
regs01 commented 7 years ago

ah, the very last option that i didn't even try.

yeah, seem to be php bug. http://i.imgur.com/jhuaFQF.png