Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.29k stars 2.73k forks source link

Formatting of email responses in NewDot isn't pretty #8088

Closed mvtglobally closed 2 years ago

mvtglobally commented 2 years ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

@greeny was sending messages back/forth to a business partner

Expected Result:

When the partner replied via email that the formatting would like nice and there wouldn't be extra line breaks

Actual Result:

Formatting is a bit janky, inc. extra line breaks

Workaround:

unknown

Platform:

Where is this issue occurring?

Version Number: 1.1.41-3 Reproducible in staging?: need repro Reproducible in production?: Y Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation

image - 2022-03-11T000521 922

Expensify/Expensify Issue URL: Issue reported by: @mallenexpensify Slack conversation: https://expensify.slack.com/archives/C01GTK53T8Q/p1646931708047189

View all open jobs on GitHub

MelvinBot commented 2 years ago

Triggered auto assignment to @MariaHCD (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

MariaHCD commented 2 years ago

I think this could be an issue on the backend when parsing the response (and thus, possibly not something an external contributor could work on)

@zsgreenwald Would you mind DM'ing me more details (eg. the user's email) so I can investigate further via the logs and possibly on Mailgun as well?

zsgreenwald commented 2 years ago

On it!

MariaHCD commented 2 years ago

Looks like we ran into something similar here: https://github.com/Expensify/Expensify/issues/142335

Looking at the logs for the user that @zsgreenwald sent over via DM, looks like we defaulted to mailgun's stripped HTML content instead of parsing it ourselves:

Email request is either using VML/XML namespaces or is from an unknown provider, relying on mailgun's stripped HTML content

cc: @thienlnam Not sure if we can do anything here?

thienlnam commented 2 years ago

Yeah, not much we can do here unless we try to strip it ourselves which leads to endless things we'd have to account for. Since mailgun email stripping has been unreliable we can look into using another library for this except we'd need to test multiple platforms to make sure it works and isn't stripping out text content. We can also strip out whitespace after stripped HTML but that also leads to some unintended changes in other email providers

MariaHCD commented 2 years ago

Thanks, @thienlnam! So it sounds like:

Option 1: Strip the content ourselves (which would be tricky due to the endless number of mail providers that we'd have to account for)

Option 2: Investigate usage of another library (which would entail lots of tests across multiple platforms to ensure the text content isn't being stripped)

Option 3:

We can also strip out whitespace after stripped HTML but that also leads to some unintended changes in other email providers

Could you elaborate on those unintended changes? Maybe we can trim any whitespaces before and after the stripped HTML?

thienlnam commented 2 years ago

Yup those options are correct, I think option 2 would actually be worthwhile to do

Could you elaborate on those unintended changes?

It's been a while so I might not be remembering correctly, but IIRC stripping whitespace messed up the email formatting inside the email and made (yahoo?) emails look especially bad. I think we had this enabled at one point but decided to revert it.

Maybe we can trim any whitespaces before and after the stripped HTML?

Yeah this could help a bit, though it looks like in the linked issue the spacing is bad between text and before/after the text too and so it would still look bad

MariaHCD commented 2 years ago

Great, thanks, @thienlnam! So moving forward, sounds like the best option would be to investigate usage of another library and this is something that can be handled internally as it is a backend change.

melvin-bot[bot] commented 2 years ago

Triggered auto assignment to @NicMendonca (Internal) because issue was reported by a contributor who needs to be compensated if this issue is fixed

MariaHCD commented 2 years ago

Triggered auto assignment to @NicMendonca (Internal) because issue was reported by a contributor who needs to be compensated if this issue is fixed

This issue was not reported by a contributor.

aldo-expensify commented 2 years ago

I worked in this in the past in this PR. I added a function to trim the html, but it missed to consider cases where there are spaces using &nsbp;. I'll do a PR to fix that.

melvin-bot[bot] commented 2 years ago

This issue has not been updated in over 15 days. @aldo-expensify eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!

zsgreenwald commented 2 years ago

@aldo-expensify @MariaHCD what's the latest with this? I'm still noticing some heavily spaced formatting on incoming messages from email. Some screenshots below:

image image
aldo-expensify commented 2 years ago

Hey @zsgreenwald , I made a PR, but it seems like it didn't work in this case. I'll have a look and see what it may be

zsgreenwald commented 2 years ago

Thanks! Let me know if there's anything i can do to help

aldo-expensify commented 2 years ago

@zsgreenwald do you get these messages also in your email inbox? or just new dot? Having the email to test with would help :)

zsgreenwald commented 2 years ago

Just NewDot. I haven't received any email summaries from these Chats specifically

zsgreenwald commented 2 years ago

Hey @aldo-expensify what else do you need from me to push this forward? Thanks!

aldo-expensify commented 2 years ago

I'll put some time tomorrow to see if I can figure it out.

quinthar commented 2 years ago

Sorry for missing something -- what is the exact problem? Is it extra whitespace? Can someone describe specifically what they would like changed?

zsgreenwald commented 2 years ago

Yes, the exact problem is the extra white space. I'm suggesting we tighten it so it looks more natural when we receive email responses.

image

Not pressing, but I figured it'd be a quick fix.

aldo-expensify commented 2 years ago
image
thienlnam commented 2 years ago

Ah this has been a PITA in the past, we've had a lot of trouble parsing / cleaning specially formatted emails from outlook which I believe this is - for these types of emails we delegate the html stripping to mailgun. In the past, we've just stripped all line breaks to solve this problem but it removed any line breaks intended within the email which also didn't look great

@aldo-expensify You could try removing all break tags before text content starts, but there's a lot of email formatting overhead that you would have to figure out.

aldo-expensify commented 2 years ago

It is indeed a PITA, but I think there is room from improvement.

I'm testing locally with a message with the following html (this is what we end up with after our API processes the email):

<div class="WordSection1"> <p class="MsoNormal">Hi Zach, </p><p></p> <p class="MsoNormal"></p><p> </p> <p class="MsoNormal">Will tomorrow at 3 est. work?</p><p></p> <p class="MsoNormal"></p><p> </p> <p class="MsoNormal">Thanks, Steve</p><p></p> <p class="MsoNormal"></p><p> </p> </div>Notice required by law: This e-mail may constitute an advertisement or solicitation under U.S. law, if its primary purpose is to advertise or promote a commercial product or service. You may choose not to receive advertising and promotional messages from Crowe LLP (except for the crowe.com website, which tracks e-mail preferences through a separate process) at this e-mail address by forwarding this message to CroweUnsubscribe@crowe.com. If you do so, the sender of this message will be notified promptly. Our principal postal address is 225 W Wacker Drive, Suite 2600, Chicago, IL 60602-4903. This email message is from Crowe LLP or one of its subsidiaries and may contain privileged or confidential information or other information exempt from disclosure under applicable law. If you are not the intended recipient, please notify the sender by reply email immediately and delete this message without reading further or forwarding to others. This email is not intended to be a contract or other legally binding obligation, and any tax advice expressed in this email should not be construed as a formal tax opinion unless expressly stated. Visit www.crowe.com/disclosure for more information about Crowe LLP and its subsidiaries.

image

The problem seems to be that whitespaces before the text "Hi Zach" are getting wrapped in divs in the front end. These are the white lines we see.

image

become these divs:

image

I think this is a problem to fix in the front end, but I'm not sure where the html gets transformed yet.

On the backend, we have some code that tries to trim the html by removing tags that don't have text content and that don't have text content after or before them. Applying the same logic, I can remove the TEXT nodes that are just blank spaces in the borders of the email that are getting converted to divs (see this draft PR). This works, but only for the white lines that are added at the top and bottom... but I think there are still some white lines added in between because of the spaces being wrapped in divs.

Result:

image
aldo-expensify commented 2 years ago

Changing this prop dangerouslyDisableWhitespaceCollapsing to false seems to fix the problem of whitespaces getting wrapped in divs:

image
aldo-expensify commented 2 years ago

@parasharrajat this PR: https://github.com/Expensify/App/pull/6975 changed dangerouslyDisableWhitespaceCollapsing to true, which seems to be causing the problem here: unnecessary extra empty lines. Maybe we should look for another solution to the problem that PR was trying to solve.

parasharrajat commented 2 years ago

We are removing that flag in another issue. I will add the PR link as soon as I find it.

parasharrajat commented 2 years ago

That PR is #9025

aldo-expensify commented 2 years ago

The solution in that PR seems to cause the exact same problem for this case:

https://user-images.githubusercontent.com/87341702/168906255-19b9f684-09a2-4a57-87ee-c60c3f83a171.mov

I don't think we should be changing how the whitespace works when rendering html outside a <pre>. If there is some indentation problems with lists.. maybe there is something to do with padding or margin?

zsgreenwald commented 2 years ago

@aldo-expensify I noticed this PR merged. I'm waiting on some email responses to come in, but should that merge fix the issue?

aldo-expensify commented 2 years ago

@aldo-expensify I noticed https://github.com/Expensify/App/pull/9025 merged. I'm waiting on some email responses to come in, but should that merge fix the issue?

I tested locally that PR and it seems to cause the same problem.

aldo-expensify commented 2 years ago

I investigate this a bit more today and I know better now when it happens, but I haven't been able to come up with a solution. I have a clearer way to reproduce it here: https://github.com/Expensify/App/pull/9195#issuecomment-1146486983

aldo-expensify commented 2 years ago

From what I can see, we have the following cases:

With white-space: pre Without `white-space: pre` Data
1 "0 spaces<br /> 1 spaces<br /> 2 spaces<br /> 3 spaces<br /> 4 spaces"
2 "0 spaces<br /> 1 spaces<br /> 2 spaces<br /> 3 spaces<br /> 4 spaces<br /> <pre>Test</pre>"
3 " <div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi SomeName, </p><p></p> <p class=\"MsoNormal\"></p><p> </p> <p class=\"MsoNormal\">Will tomorrow at 34 est. work?</p><p></p> <p class=\"MsoNormal\"></p><p> </p> <p class=\"MsoNormal\">Thanks, Test</p><p></p> <p class=\"MsoNormal\"></p><p> </p> </div>xxxxxx xxxxxxxx xx xxx: xxxx x-xxxx xxx xxxxxxxxxx xx xxxxxxxxxxxxx xx xxxxxxxxxxxx xxxxx x.x. xxx, xx xxx xxxxxxx xxxxxxx xx xx xxxxxxxxx xx xxxxxxx x xxx..."
  1. This case is NOT rendered as HTML because of this exception in handling it, so it works in both cases.
  2. and 3. There are other html tags than <br />, so we render these cases using the react-native-render-html library.

I think it makes sense to add the white-space: pre when we are rendering html messages typed by the user in our App because that makes them render as close as possible to what the user initially typed (preserving spacing). The problem appears when the message came from an email. The email client has a different way of rendering the HTML of the message, they use white-space: normal and &nbsp; to preserve spacing.

We could try one of the following fixes:

  1. Flag the messages that came from the email intake, don't force white-space: pre for the flagged messages
  2. Check if the message contains a &nbsp;, if it does, don't force white-space: pre
  3. ~Replace &nbsp; by normal space and keep forcing white-space: pre~. This didn't work: https://github.com/Expensify/App/pull/9195#issuecomment-1150279213
  4. Do heavier processing of the email in the backend to avoid the amount of weird structured html in them, but that sounds like an endless amount of work.
aldo-expensify commented 2 years ago

Just mentioning that if you edit the message that came from the email reply, it "breaks" like this:

image

The result is the same with and without white-space:pre

aldo-expensify commented 2 years ago

I did some more testing today and found out that outlook likes to add a bunch of \n in the email's html. The \n with whitespace: pre makes a lot of empty lines appear.

The proposed solution of avoid using whitespace: pre for message comings from emails handles well this case, so I'm sticking with that.

I updated the PR's descriptions and made them ready for review.

melvin-bot[bot] commented 2 years ago

This issue has not been updated in over 15 days. @aldo-expensify eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!