IndySockets / Indy

Indy - Internet Direct
https://www.indyproject.org
449 stars 153 forks source link

How is DecompressRtf used in IdCoderTNEF.pas? #221

Closed mynameking closed 1 year ago

mynameking commented 6 years ago

Hello!

How is DecompressRtf used in IdCoderTNEF.pas?

I received a message with "text/rtf" that cannot be decoded.

thank you all ! 123

rlebeau commented 6 years ago

In what way exactly is it not decoded? Please be more specific. I don't understand what your screenshot is trying to show. Can you provide the actual TNEF file?

mynameking commented 6 years ago

The attachment is an eml file with the application/ms-tnef type. The text content is of type 'text/rtf', I can't display the content properly.

{\rtf1\ansi\ansicpg936\fromhtml1 \fbidis \deff0{\fonttbl

{\f0\fswiss\fcharset134 Simsun;}

{\f1\fmodern Simsun;}

{\f2\fnil\fcharset2 Symbol;}

{\f3\fmodern\fcharset0 Courier New;}} ……

mynameking commented 6 years ago
procedure TForm1.Button1Click(Sender: TObject);
var
  i, v, MSGPatrsCount: Integer;
  msg, mms: TIdMessage;
  BodySL: TStringList;
  Attachment : TIdAttachment;
  t, AttContentType: string;
  IdCoderTNEF: TIdCoderTNEF;
begin
  BodySL := TStringList.Create;

  msg := TIdMessage.Create(nil);
  mms := TIdMessage.Create(nil);
  Attachment := TIdAttachment.Create(nil);

  msg.LoadFromFile('d:\6.eml');
  Msg.MessageParts.CountParts;
  MSGPatrsCount :=  msg.MessageParts.Count;

  Msg.Headers.CaseSensitive := FALSE;
  Msg.Headers.UnfoldLines := TRUE;

  i := 0;
  while i < MSGPatrsCount do
  begin
    try
      try
        Attachment := TIdAttachment(msg.MessageParts.Items[I]);
        AttContentType := Attachment.ContentType;

        if (Msg.MessageParts.Items[i] is TIdText) then
        begin
          if AttContentType  = 'text/plain' then
          begin
            t := TIdText(Msg.MessageParts.Items[i]).Body.Text;
          end;

          if AttContentType  = 'text/html' then
          begin
            t := TIdText(Msg.MessageParts.Items[i]).Body.Text;
          end;
        end;

        if (Msg.MessageParts.Items[i] is TIdAttachment) then
        begin
          if AttContentType  = 'application/ms-tnef' then
          begin
            IdCoderTNEF := TIdCoderTNEF.Create;
            IdCoderTNEF.Parse(Attachment , mms);
            for v := 0 to mms.MessageParts.Count - 1 do
            begin
              AttContentType := LowerCase(TIdAttachment(mms.MessageParts.Items[v]).ContentType);
              if mms.MessageParts.Items[V] IS TIdAttachment then
              begin
                //
              end;
              if mms.MessageParts.Items[V] IS TIdText then
              begin
                if AttContentType  = 'text/rtf' then
                begin
                  BodySL.Add(TIdText(mms.MessageParts[v]).Body.Text);
                  Memo1.Lines.Add(BodySL.Text);
                end;
              end;
            end;
            FreeAndNil(IdCoderTNEF);
          end;
        end;
      except
      end;
    finally
      inc(i);
    end;
  end;
  FreeAndNil(mms);
  FreeAndNil(Attachment);
  FreeAndNil(msg);
end;
mynameking commented 6 years ago

In addition, DecodeHeader cannot parse the Subject correctly;

Please check the attachment ErrorMEL.

Foxmail, Thunderbird can be parsed normally.

procedure TForm1.Button1Click(Sender: TObject);
var
  strSubject: string;
begin
  Memo1.Clear;

  IdMessage1.LoadFromFile('d:\Error1.eml');
  strSubject := IdMessage1.Headers.Values['Subject'];

  Memo1.Lines.Add(DecodeHeader(strSubject));
  Memo1.Lines.Add((IdMessage1.Subject));

  IdMessage1.LoadFromFile('d:\Error2.eml');
  strSubject := IdMessage1.Headers.Values['Subject'];

  Memo1.Lines.Add(DecodeHeader(strSubject));
  Memo1.Lines.Add((IdMessage1.Subject));
end;

Screenshot for foxmail positive solution parsing content

image

image

ErrorEML.zip

rlebeau commented 6 years ago

The reason you can't display the RTF correctly has nothing to do with Indy. TIdCoderTNEF is decoding the RTF correctly (I could open the decoded RTF in WordPad). RTF is not plain text, but you are storing it in a TStringList and adding it to a TMemo, neither of which know anything about RTF. To display RTF visually, you need to use TRichEdit instead of TMemo. Set the TRichEdit.PlainText property to False, and then load the RTF via one of the TRichEdit.Lines.LoadFrom...() methods.

rlebeau commented 6 years ago

As for DecodeHeader(), it cannot decode the Subject header of the 1st email because there is nothing for it to decode. That Subject header is not encoded in a format that DecodeHeader() is designed to parse (RFC 2047), so it returns the header data as-is. That is by design.

The Subject header of the 1st email is encoded in raw UTF-8 form, which in of itself is illegal by modern email standards (only ASCII is allowed in email headers, which is why RFC 2047 exists at all), and there is nothing in the rest of the email headers to inform an email reader that the Subject is using raw UTF-8. I don't care that other email readers can parse this header. They have access to the raw bytes of the email, and can analyze those bytes to detect UTF-8 dynamically. But when Indy reads an email from file (or a socket), the UTF-8 bytes have already been lost as soon as they are read into memory, since Indy doesn't read emails as UTF-8 by default, it reads them as ASCII and decodes according to what the headers say to decode, per established Internet standards. So, the header data for the 1st email is already corrupted before it ever reaches DecodeHeader(). That is not the fault of DecodeHeader() itself. It is the fault of the email being malformed to begin with. MOST emails in the world follow proper encoding practices. Whoever sent that email is not.

If you have a problem with this assessment, feel free to submit a separate issue ticket it, as it has nothing to do with this ticket regarding TIdCoderTNEF. The only option currently available to tell Indy to read an email header's raw bytes as UTF-8 instead of ASCII is to skip the TIdMessage.LoadFrom...() method altogether and call the TIdMessageClient.ProcessMessage() method directly, using an IOHandler whose DefStringEncoding property has been set to IndyTextEncoding_UTF8, eg:

// IdMessage1.LoadFromFile('d:\Error1.eml');
Stream := TIdReadFileExclusiveStream.Create('d:\Error1.eml');
try
  IdMessage1.Clear;
  MsgClient := TIdMessageClient.Create;
  try
    IOHandler := TIdIOHandlerStreamMsg.Create(nil, Stream);
    try
      IOHandler.FreeStreams := False;
      IOHandler.DefStringEncoding := IndyTextEncoding_UTF8; // <-- HERE
      MsgClient.IOHandler := IOHandler;
      try
        IOHandler.Open;
        MsgClient.ProcessMessage(IdMessage1, false);
      finally
        MsgClient.IOHandler := nil;
      end;
    finally
      IOHandler.Free;
    end;
  finally
    MsgClient.Free;
  end;
finally
  Stream.Free;
end;

DecodeHeader() is able to properly decode the Subject header of the 2nd email, because that header is encoded in RFC 2047 format, as it should be.