IndySockets / Indy

Indy - Internet Direct
https://www.indyproject.org
449 stars 153 forks source link

DecodeHeader bug #152

Open rlebeau opened 7 years ago

rlebeau commented 7 years ago

DecodeHeader('=?utf-8?B?0JDQn9CV0JrQodCe0KTQojog0JfQsNC60LDQtyDQv9GA0LXQ?= =?utf-8?B?t9C10L3RgtCw0YbQuNC4?=')

Expected: АПЕКСОФТ: Заказ презентации Actual: АПЕКСОФТ: Заказ пре??ентации

Here is another example of a header that does not decode correctly:

Subject: =?UTF-8?Q?Sape.ru: =D0=9D=D0=BE=D0=B2=D0=BE=D1=81=D1=82=D0=BD=D0=B0=D1=8F =D1=80=D0=B0=D1=81=D1=81=D1=8B=D0=BB=D0=BA=D0=B0 =E2=84=9611?=

Both samples are due to faulty encoded data. The first sample contains a UTF-8 encoded character that is being split between the two MIME encoded-words, which violated RFC 2047. The second sample contains unencoded whitespace, which is forbidden by RFC 2047.

The above samples was originally tested with Delphi 7. In D2009+, the first sample produces a completely blank string instead. This is because Embarcadero's SysUtils.TUTF8Encoding class uses the MB_ERR_INVALID_CHARS flag when calling MultiByteToWideChar(), which fails because of the split character octets. In Delphi 7, Indy uses its own TIdUTF8Encoding class that does not use the MB_ERR_INVALID_CHARS flag. Indy's parser needs to be updated to use TIdUTF8Encoding in all Delphi versions.

rlebeau commented 7 years ago

SVN rev 4900 updates the IdHeaderCoderIndy.pas unit to use the CharsetToEncoding() function, which uses TIdUTF8Encoding for UTF-8. This allows the first example to no longer return a blank string on failure, though the output will not be 100% correct because of the split codeunits.

The second example still fails, because DecodeHeader() validates whitespace while extracting the MIME encoding, so it does not detect that the data is encoded and skips it.