Closed jbaranguan closed 1 year ago
Hey Jorge! Can you please provide me with the full export of that email? With headers etc. Like an .EML file or even .txt. Furthermore, from what email client was the email forwarded?
I'm afraid that I cannot provide you the exact full export of the email because it contains personal data of our clients. I made a first round of anonymization of the content to try to remove some personal data.
The email was forwarded from Outlook 2019 to our platform and the body-plain is provided by mailgun.js our mail provider. You can find the json file saved by our WS when received from mailgun. I reproduced the problem using this transformed email.
In the email body there is a thread of forwarded messages and I cannot say which mailer is used by the user that inserts the automatic block This email (including...
Thanks for the anonymized email!
Could you please screenshot me the specific version of Outlook? I think it's the "new" Outlook 2019. In that version, there is no separator anymore, which makes the parsing really difficult. Especially when it's a long chain of email replies / email forwards (your case).
What happens is that the ________________________________
part at the end acts as a false positive, as it's the exact separator used by Outlook 365 / Outlook Live. And this library "prefers" an exact separator rather than no separator at all.
If we delete it, the parsing is successful. There is one remaining issue on recipients with a coma in their name (eg. "C,A" or "LBRN, NFZ"), which are wrongly parsed because I never expected this format. I will update the library to fix this.
For the ________________________________
thing, I need to find a solution to avoid detecting this as a false positive.
Thanks for the reactivity!
I cannot screenshot the specific version of Outlook as it's a client's client user.
~I was thinking that a possibility in a best-effort mode would be to discard the found email text if you don't find a proper forwarded email (no from, to, subject, etc) and iterate on the remaining body until you find a well formatted email. What do you think ?~
EDIT: This approach does not work either with a thread like my example. The parsing should be performed following the email order, otherwise you will always find emails in the middle of the thread if they have a separator that is handled with a higher priority, right?
That's exactly true, the higher the better. In fact this is already enforced but there is an edge case when the highest email has no separator at all.
I can definitely improve things. I'll have a look at this in the coming days!
Hey! I have improved the support for nested emails, v1.4.0 will fix your issues.
Let me know ;)
Hello!
Thank you very much for the fix. I think that I will be able to test it next week, I have a lot of work to do for now, I'll let you know!
Have a nice day!
Jorge BARANGUAN Engineering Lead 07 81 89 99 47 www.iwecloud.com https://twitter.com/iWE_cloud https://www.linkedin.com/company/iwe-cloud
On Wed, 10 May 2023 at 22:28, Eliott Vincent @.***> wrote:
Hey! I have improved the support for nested emails, v1.4.0 will fix your issues.
Let me know ;)
— Reply to this email directly, view it on GitHub https://github.com/crisp-oss/email-forward-parser/issues/13#issuecomment-1542768553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEXKBW24O46EVZCOT57ZVILXFP26ZANCNFSM6AAAAAAW3VRG3Q . You are receiving this because you authored the thread.Message ID: @.***>
Hey there! Were you able to test?
Hey! Yes, I did, and it works much better :)
We're releasing a new version in production today containing your fix, I hope it will fix all our support tickets on that! :+1:
I close the ticket.
Thank you very much!
Hi,
Your lib is great! Thank you!
Nevertheless I have an issue when I parse a forwarded message containing an automatically insterted block that is inserted in the end following multiple "_".
A reproducer:
When performing
new EmailForwardParser().read(mailBody, "***URGENT** 9673155358 nos réf")
, the lib detects the part after the ____ (This email (including any attachments) is intended for the designated recipient(s) only...
) as the forwarded email, hence I cannot extract the from/to information.Do you think that it could be fixed by removing this groups of _ characters before parsing?