Open rlebeau opened 7 years ago
SVN rev 4900 updates the IdHeaderCoderIndy.pas
unit to use the CharsetToEncoding()
function, which uses TIdUTF8Encoding
for UTF-8. This allows the first example to no longer return a blank string on failure, though the output will not be 100% correct because of the split codeunits.
The second example still fails, because DecodeHeader()
validates whitespace while extracting the MIME encoding, so it does not detect that the data is encoded and skips it.
DecodeHeader('=?utf-8?B?0JDQn9CV0JrQodCe0KTQojog0JfQsNC60LDQtyDQv9GA0LXQ?= =?utf-8?B?t9C10L3RgtCw0YbQuNC4?=')
Expected:
АПЕКСОФТ: Заказ презентации
Actual:АПЕКСОФТ: Заказ пре??ентации
Here is another example of a header that does not decode correctly:
Subject: =?UTF-8?Q?Sape.ru: =D0=9D=D0=BE=D0=B2=D0=BE=D1=81=D1=82=D0=BD=D0=B0=D1=8F =D1=80=D0=B0=D1=81=D1=81=D1=8B=D0=BB=D0=BA=D0=B0 =E2=84=9611?=
Both samples are due to faulty encoded data. The first sample contains a UTF-8 encoded character that is being split between the two MIME encoded-words, which violated RFC 2047. The second sample contains unencoded whitespace, which is forbidden by RFC 2047.
The above samples was originally tested with Delphi 7. In D2009+, the first sample produces a completely blank string instead. This is because Embarcadero's
SysUtils.TUTF8Encoding
class uses theMB_ERR_INVALID_CHARS
flag when callingMultiByteToWideChar()
, which fails because of the split character octets. In Delphi 7, Indy uses its ownTIdUTF8Encoding
class that does not use theMB_ERR_INVALID_CHARS
flag. Indy's parser needs to be updated to useTIdUTF8Encoding
in all Delphi versions.