crisp-oss / email-forward-parser

🐛 Parses forwarded emails and extracts original content.
https://www.npmjs.com/package/email-forward-parser
MIT License
50 stars 21 forks source link

Mails with blocks added after underscore are not correctly managed #13

Closed jbaranguan closed 1 year ago

jbaranguan commented 1 year ago

Hi,

Your lib is great! Thank you!

Nevertheless I have an issue when I parse a forwarded message containing an automatically insterted block that is inserted in the end following multiple "_".

A reproducer:

I transfer you that mail.

De : Jorge BARANGUAN <baranguan@hotmail.com>
Envoyé : jeudi 6 avril 2023 16:17
À : Jorge BARANGUAN <jorge.baranguan@iwecloud.com>
Objet : ***URGENT** 9673155358 nos réf

MY body email...
  ________________________________
  This email (including any attachments) is intended for the designated recipient(s) only, and may be confidential, non-public, proprietary, and/or protected by the attorney-client or other privilege. Unauthorized reading, distribution, copying or other use of this communication is prohibited and may be unlawful. Receipt by anyone other than the intended recipient(s) should not be deemed a waiver of any privilege or protection. If you are not the intended recipient or if you believe that you have received this email in error, please notify the sender immediately and delete all copies from your computer system without reading, saving, printing, forwarding or using it in any manner. Although it has been checked for viruses and other malicious software (\"malware\"), we do not warrant, represent or guarantee in any way that this communication is free of malware or potentially damaging defects. All liability for any actual or alleged loss, damage, or injury arising out of or resulting in any way from the receipt, opening or use of this email is expressly disclaimed.

When performing new EmailForwardParser().read(mailBody, "***URGENT** 9673155358 nos réf"), the lib detects the part after the ____ (This email (including any attachments) is intended for the designated recipient(s) only...) as the forwarded email, hence I cannot extract the from/to information.

Do you think that it could be fixed by removing this groups of _ characters before parsing?

eliottvincent commented 1 year ago

Hey Jorge! Can you please provide me with the full export of that email? With headers etc. Like an .EML file or even .txt. Furthermore, from what email client was the email forwarded?

jbaranguan commented 1 year ago

I'm afraid that I cannot provide you the exact full export of the email because it contains personal data of our clients. I made a first round of anonymization of the content to try to remove some personal data.

The email was forwarded from Outlook 2019 to our platform and the body-plain is provided by mailgun.js our mail provider. You can find the json file saved by our WS when received from mailgun. I reproduced the problem using this transformed email.

email.txt

jbaranguan commented 1 year ago

In the email body there is a thread of forwarded messages and I cannot say which mailer is used by the user that inserts the automatic block This email (including...

eliottvincent commented 1 year ago

Thanks for the anonymized email!

Could you please screenshot me the specific version of Outlook? I think it's the "new" Outlook 2019. In that version, there is no separator anymore, which makes the parsing really difficult. Especially when it's a long chain of email replies / email forwards (your case).

What happens is that the ________________________________ part at the end acts as a false positive, as it's the exact separator used by Outlook 365 / Outlook Live. And this library "prefers" an exact separator rather than no separator at all.

If we delete it, the parsing is successful. There is one remaining issue on recipients with a coma in their name (eg. "C,A" or "LBRN, NFZ"), which are wrongly parsed because I never expected this format. I will update the library to fix this.

For the ________________________________ thing, I need to find a solution to avoid detecting this as a false positive.

jbaranguan commented 1 year ago

Thanks for the reactivity!

I cannot screenshot the specific version of Outlook as it's a client's client user.

~I was thinking that a possibility in a best-effort mode would be to discard the found email text if you don't find a proper forwarded email (no from, to, subject, etc) and iterate on the remaining body until you find a well formatted email. What do you think ?~

EDIT: This approach does not work either with a thread like my example. The parsing should be performed following the email order, otherwise you will always find emails in the middle of the thread if they have a separator that is handled with a higher priority, right?

eliottvincent commented 1 year ago

That's exactly true, the higher the better. In fact this is already enforced but there is an edge case when the highest email has no separator at all.

I can definitely improve things. I'll have a look at this in the coming days!

eliottvincent commented 1 year ago

Hey! I have improved the support for nested emails, v1.4.0 will fix your issues.

Let me know ;)

jbaranguan commented 1 year ago

Hello!

Thank you very much for the fix. I think that I will be able to test it next week, I have a lot of work to do for now, I'll let you know!

Have a nice day!

Jorge BARANGUAN Engineering Lead 07 81 89 99 47 www.iwecloud.com https://twitter.com/iWE_cloud https://www.linkedin.com/company/iwe-cloud

On Wed, 10 May 2023 at 22:28, Eliott Vincent @.***> wrote:

Hey! I have improved the support for nested emails, v1.4.0 will fix your issues.

Let me know ;)

— Reply to this email directly, view it on GitHub https://github.com/crisp-oss/email-forward-parser/issues/13#issuecomment-1542768553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXKBW24O46EVZCOT57ZVILXFP26ZANCNFSM6AAAAAAW3VRG3Q . You are receiving this because you authored the thread.Message ID: @.***>

eliottvincent commented 1 year ago

Hey there! Were you able to test?

jbaranguan commented 1 year ago

Hey! Yes, I did, and it works much better :)

We're releasing a new version in production today containing your fix, I hope it will fix all our support tickets on that! :+1:

I close the ticket.

Thank you very much!