HiraokaHyperTools / msgreader

35 stars 9 forks source link

Bad RTF produced? #41

Closed eduardomb08 closed 9 months ago

eduardomb08 commented 11 months ago

Hi,

I'm working with both msgreader and rtf-stream-parser and am encountering an issue for certain .msg files.

Although msgreader does produce a result (meaning it doesn't just blow up), the result looks significantly different between two attempts trying to produce a sharable .msg to reproduce the issue.

Could you please take a look at the issue reported below and provide your thoughts as if the msgreader results look like what would be expected?

Not encapsulated HTML or text file

Thanks in advance,

Eduardo Monteiro de Barros

kenjiuno commented 11 months ago

Hi.

There is an demo site of combination of both msgreader and rtf-stream-parser.

https://hiraokahypertools.github.io/msgreader_demo3/

Is same error reproducible with this version too?

eduardomb08 commented 11 months ago

Hi.

There is an demo site of combination of both msgreader and rtf-stream-parser.

https://hiraokahypertools.github.io/msgreader_demo3/

Is same error reproducible with this version too?

Yes:

image
kenjiuno commented 11 months ago

Ok, and then next... With decoded .rtf file, can Microsoft Word open it?

At first obtain file.rtf file by clicking Download link of RTF section from web browser:

2023-10-11_12h11_08

Open the file.rtf with Microsoft Word:

2023-10-11_12h12_58

If rtf file is valid, Word will open and render it.

2023-10-11_12h13_46

eduardomb08 commented 11 months ago

Ok, and then next... With decoded .rtf file, can Microsoft Word open it?

At first obtain file.rtf file by clicking Download link of RTF section from web browser:

Open the file.rtf with Microsoft Word:

If rtf file is valid, Word will open and render it.

Ok. I've downloaded the RTF. Here is what it looks like in Word:

image
kenjiuno commented 11 months ago

Ok. I've downloaded the RTF. Here is what it looks like in Word:

It looks like that plain HTML was pasted in the RTF file. Is it expected?

eduardomb08 commented 11 months ago

Ok. I've downloaded the RTF. Here is what it looks like in Word:

It looks like that plain HTML was pasted in the RTF file. Is it expected?

That's what's returned by msgreader (even using demo3):

image

kenjiuno commented 11 months ago

That's what's returned by msgreader (even using demo3):

According to input from you, this error Error: Not encapsulated HTML or Text file is possible.

Encapsulated HTML will represent embedded HTML tag like {\*\htmltag <html>}

If \htmltag control word is missing, it is just plain text document, and not Encapsulated HTML.

I suspect that msg file isn't generated by Microsoft Outlook.

If it is really plain text, \fromtext control word is inserted instead.

2023-10-12_20h10_55

Can Microsoft Outlook display the msg file content as expected?

eduardomb08 commented 11 months ago

That's what's returned by msgreader (even using demo3):

According to input from you, this error Error: Not encapsulated HTML or Text file is possible.

If \htmltag control word is missing, it is just plain text document, and not Encapsulated HTML.

I suspect that msg file isn't generated by Microsoft Outlook.

If it is really plain text, \fromtext control word is inserted instead.

Can Microsoft Outlook display the msg file content as expected?

Yes, it can:

image

kenjiuno commented 11 months ago

Yes, it can:

Hmm. I'm sorry that I cannot determine what part is malfunctional, for now.

2023-10-13_17h23_43

eduardomb08 commented 11 months ago

Yes, it can:

Hmm. I'm sorry that I cannot determine what part is malfunctional, for now.

I see if I can find time this weekend to clone the repo and take a look. Do you have any tips?

eduardomb08 commented 11 months ago

Yes, it can:

Hmm. I'm sorry that I cannot determine what part is malfunctional, for now.

Got it running locally. If you have any tips on how to debug it that would be really helpful:

image

kenjiuno commented 11 months ago

Got it running locally. If you have any tips on how to debug it that would be really helpful:

In order to compare, it is better to search another implementation of msg file reader that is published as open source, rather than msgreader and JavaScript...

2023-10-16_11h00_33

FROGGS commented 11 months ago

Hi, I'm not sure this is a bug at all. I'm using msgreader to render emails to html to display them an a webpage for a document managment system.

My code looks like this:

    this.msgReader = new MsgReader(buf);
    var fileData  = this.msgReader.getFileData();

    if (fileData.compressedRtf && fileData.compressedRtf.length > 8) {
      // mail is rtf-encoded
    }

    else if (fileData.bodyHtml) {
      // mail is already in html format
    }

    else if(fileData.body) {
      // mail is plain text only
    }

And why do I do this? Because Outlook saves emails in one of the three formats: image

This means that while Outlook can display emails with html-in-rtf, Word does not. Word needs proper rtf encoded data, not just html wrapped in an rtf envelope.

eduardomb08 commented 11 months ago

Hi, I'm not sure this is a bug at all. I'm using msgreader to render emails to html to display them an a webpage for a document managment system.

My code looks like this:

    this.msgReader = new MsgReader(buf);
    var fileData  = this.msgReader.getFileData();

    if (fileData.compressedRtf && fileData.compressedRtf.length > 8) {
      // mail is rtf-encoded
    }

    else if (fileData.bodyHtml) {
      // mail is already in html format
    }

    else if(fileData.body) {
      // mail is plain text only
    }

And why do I do this? Because Outlook saves emails in one of the three formats:

This means that while Outlook can display emails with html-in-rtf, Word does not. Word needs proper rtf encoded data, not just html wrapped in an rtf envelope.

@FROGGS , the image I attached in this comment above shows the compressedRtf value for my msg. It would satisfy your first if condition for rtf-encoded e-mails, but the library isn't able to process it.

I'm not saying this means it is a bug. Maybe it's just some edge case or exception to what the library is capable of processing. But it would be nice if there were a way to improve it to handle this case too.

FROGGS commented 11 months ago

Maybe this is about the used Outlook version? Can you produce another msg-file which you can share perhaps?

kenjiuno commented 11 months ago

If you are interested in, you can test this alternative one for comparison with this msgreader.

https://hiraokahypertools.github.io/msgreader_net_demo/

This is built on ASP.NET Core Blazor WebAssembly. Thus there is no backend server for this.

If html output is available BodyHtml may appear.

2023-10-17_22h38_23

eduardomb08 commented 11 months ago

If you are interested in, you can test this alternative one for comparison with this msgreader.

https://hiraokahypertools.github.io/msgreader_net_demo/

This is built on ASP.NET Core Blazor WebAssembly. Thus there is no backend server for this.

If html output is available BodyHtml may appear.

Looks the same:

image

I'll look for another .msg that I can share.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 9 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.