Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.58k stars 2.92k forks source link

[HOLD for payment 2023-03-09] [$4000] Inconsistent formatting on multi-line formatted auto-email #14695

Closed kavimuru closed 1 year ago

kavimuru commented 1 year ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

  1. Open staging.new.expensify.com
  2. Open a chat and send this :
    ~firstemail@gmail.com
    secondemail@gmail.com~
  3. Send this one
    ~firstwebsite.com
    secondwebsite.com~

    Expected Result:

    All of the text (both email and website) should be on strikethrough.

Actual Result:

While the website is all on strikethrough, the second email (secondemail@gmail.com) doesn't got strikethrough.

Workaround:

unknown

Platforms:

Which of our officially supported platforms is this issue occurring on?

Version Number: v1.2.62-1 Reproducible in staging?: Y Reproducible in production?: Y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos:

staging new expensify com_r_7203133188320812 (1)

https://user-images.githubusercontent.com/43996225/215822124-98b2cef9-b40b-48e3-b903-12ea230db46b.mp4

https://user-images.githubusercontent.com/43996225/215822312-63ac8273-dc70-4fae-aed8-d1d9c674faff.mp4

Expensify/Expensify Issue URL: Issue reported by: @kerupuksambel Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1675115210540239

View all open jobs on GitHub

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~0133c0accb82a12340
  • Upwork Job ID: 1620489634218385408
  • Last Price Increase: 2023-03-14
JmillsExpensify commented 1 year ago

Alright, I'm able to reproduce this issue. Given that it's a markdown parsing issues I'm going to open up to External now.

Screenshot 2023-01-31 at 12 29 07

melvin-bot[bot] commented 1 year ago

Job added to Upwork: https://www.upwork.com/jobs/~0133c0accb82a12340

melvin-bot[bot] commented 1 year ago

Current assignee @JmillsExpensify is eligible for the External assigner, not assigning anyone new.

melvin-bot[bot] commented 1 year ago

Triggered auto assignment to Contributor-plus team member for initial proposal review - @thesahindia (External)

melvin-bot[bot] commented 1 year ago

Triggered auto assignment to @NikkiWines (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

JmillsExpensify commented 1 year ago

I should also note, one outstanding question is whether multi-line markdown should be supported. Clearly the issue right now is that we do support it, we just support it inconsistently. That said, I think it's equally valid to question, should we stick to markdown within one line, before the break? I can only imagine this is one of several edge cases we're going to face because we allow multi-line.

mirraashid commented 1 year ago

@JmillsExpensify/ @NikkiWines - I have identified the issue -

In AnchorRenderer.js component, for the first email we are getting 1 child (native HTML span) in props.tnode and subsequently getting textDecorationLine: "line-through" style in nativeTextRet object, while in case of second email we don't get any child. I will draft the proposal soon.

marktoman commented 1 year ago

Proposal

Problem

The regular pattern does not separate ~ from the first email because it matches it erroneously with the email text.

Solution

Match them separately by removing ~ and related from the email pattern and keeping them in the first and second groups.

Remove the line below: https://github.com/Expensify/expensify-common/blob/aaf49708420187a13ebe3bdb97b67104e6a6bcfc/lib/ExpensiMark.js#L99 and add:

+ regex: /(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*)([a-zA-Z0-9.!#$%&'+/=?^`{|}-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim,

I didn't realize these are technically valid email characters. The solution is to make sure the email doesn't start with them and allow them inside:

+ regex: /(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*?)([a-zA-Z0-9.!#$%&'+/=?^`{|}-][a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim,

Fix the italic rule by allowing the same range of the valid characters as the bold rule, while keeping the invalid character specific to the italic rule.

Replace https://github.com/Expensify/expensify-common/blob/aaf49708420187a13ebe3bdb97b67104e6a6bcfc/lib/ExpensiMark.js#L134 with:

+ regex: /(?!_blank")\b_((?=\S)(([^\s*]|\s)+?))_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,

Another way to deal with emails is to let the italic/bold/strikethrough rule run before the autoEmail rule. This supports the emails starting with ~_* and multiline at the same time. In this case, the autoEmail regex remains the same. The italic rule still needs to be fixed, but that is unrelated to autoEmail.

Move this here.

mirraashid commented 1 year ago

@marktoman - Actually, the issue is that email is not getting formatted like the normal url is getting. If you check modifyTextForUrlLinks function, there we are adding ['bold', 'strikethrough', 'italic'] styles for the normal link.. but same isn't done for email. As a result we get different html for link and email -

For Email - <a href="mailto:~firstemail@gmail.com"><del>firstemail@gmail.com</del></a><br /><a href="mailto:secondemail@gmail.com">secondemail@gmail.com</a>

For Link - <del><a href="https://firstwebsite.com" target="_blank" rel="noreferrer noopener">firstwebsite.com</a><br /><a href="https://secondwebsite.com" target="_blank" rel="noreferrer noopener">secondwebsite.com</a></del>

Please note the difference in <del> tag for both the cases

marktoman commented 1 year ago

@mirraashid Yes, this is the effect of the regex putting ~ inside <a>~... instead of before. It is subsequently replaced by <del>, but that doesn't change its position.

thesahindia commented 1 year ago

I think we can combine https://github.com/Expensify/App/issues/14675 in this issue.

cc: @NikkiWines @JmillsExpensify

marktoman commented 1 year ago

I've updated the proposal. Now all uses of ~*_ around the regular text or emails are supported for multiline. I should mention that the original expressions can be refactored to be cleared while retaining their tested functionality.

NikkiWines commented 1 year ago

I should also note, one outstanding question is whether multi-line markdown should be supported. Clearly the issue right now is that we do support it, we just support it inconsistently. That said, I think it's equally valid to question, should we stick to markdown within one line, before the break? I can only imagine this is one of several edge cases we're going to face because we allow multi-line.

Yeah, interesting point @JmillsExpensify. FWIW, neither Slack nor Whatsapp support multiline strikethrough. Github does support it. Personally, I agree that this'll probably open up several edge-case scenarios for a pretty niche function. I think strikethrough should be delineated by a ~ on either end of sentence or phrase, without a space or newline to break it.

marktoman commented 1 year ago

@NikkiWines Currently, multiline strikethrough is not properly ignored or supported for emails, but it is supported for regular text and links. Reverting the support across the board may open more problems, given what the users are used to.

JmillsExpensify commented 1 year ago

Given that our app is still largely beta, we don't need to worry too much about what users are used to. That said, this is a good one to bring out to discussion in our Slack community. I'll circle back on that and CC those in this issue I can find using similar usernames to GH.

marktoman commented 1 year ago

@JmillsExpensify Thanks for clarifying. Do you benchmark against specific apps or any noteworthy ones? E.g., Notion supports multiline. Slack doesn't, but it at least lets the user select the lines and make them bold/italic/strikethrough via the toolbar.

s77rt commented 1 year ago

@marktoman Emails that start with ~ are valid. Example: ~firstemail@gmail.com is valid. However with your regex this email will turn into another email firstemail@gmail.com

marktoman commented 1 year ago

@s77rt I know and I mentioned it in those two solutions, but this is not the case with the third one.

s77rt commented 1 year ago

@marktoman Sorry my bad, I thought you meant not to support strike-through for emails that starts with those characters but the emails would remain as they are. (i.e. we won't be able to strike-though ~firstemail@gmail.com but the mailto target would still be ~firstemail@gmail.com)

I have tried your latest solution (changing order of rules) it does seem to fix the issue however it fails parsing ~~firstemail@gmail.com~ (the mailto target should be ~firstemail@gmail.com)

allroundexperts commented 1 year ago

Proposal

Problem

The issue arises due to inconsistent HTML being generated for the emails. The email rule in the formatter runs first, creating an html like: <a href="~...">~...</a><br/><a href="...">...</a>~ When the strikethrough rule runs, it takes the above HTML and strikethrough's the first email (with the ~ character removed inside the anchor text but present in the href section. The generated HTML after this rule looks like: <a href="~..."><del>...</a><br/><a href="...">...</a></del>

Moving the strikethrough rule above email would cause the auto link to break. Thus we can not use it.

The second solution here https://github.com/Expensify/App/issues/14695#issuecomment-1411090815 does not result in correct href value of the anchor tag (Ignores ~ no matter how many ~ are added to the email).

Solution

We can fix the inconsistency for STRIKETHROUGH (as mentioned in the bug) by changing the following rule:

https://github.com/Expensify/expensify-common/blob/main/lib/ExpensiMark.js#L97-L101

to

{
                name: 'autoEmail',
                process: (textToProcess, replacement) => {
                    const [match, c1, rawText] = textToProcess.match(/([~]?)([\S\s]+)\1/);
                    const regex = /(?![^<]*>|[^<>]*<\\)([_*]*)([_*]*)([a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim;
                    if (regex.test(rawText)) {
                        return `${c1 ? '<del>': ''}${rawText.replace(regex, replacement)}${c1 ? '</del>': ''}`;
                    }
                    return textToProcess;
                },
                replacement: (match, g1, g2, g3) => `${g1}${g2}<a href="mailto:${g3}">${g3}</a>${g2}${g1}`,
            },

Result

https://user-images.githubusercontent.com/30054992/216838421-e0303999-ca1c-4979-988b-e3634397970b.mov

thesahindia commented 1 year ago

That said, this is a good one to bring out to discussion in our Slack community

Is there any discussion going @JmillsExpensify ?

JmillsExpensify commented 1 year ago

Ah no, do you mind getting that going? I'm working on some other R&D projects at the moment and I haven't been able to circle back around to this.

JmillsExpensify commented 1 year ago

@JmillsExpensify Thanks for clarifying. Do you benchmark against specific apps or any noteworthy ones? E.g., Notion supports multiline. Slack doesn't, but it at least lets the user select the lines and make them bold/italic/strikethrough via the toolbar.

@marktoman great question. I would say not particularly, though Slack and Discord are good examples to include in the conversation, especially since they tend to differ in their approaches.

thesahindia commented 1 year ago

Started the thread. Feel free to share your opinions.

thesahindia commented 1 year ago

Hey @JmillsExpensify can you please bump the discussion? We still need some opinions.

JmillsExpensify commented 1 year ago

Done, thank you!

MelvinBot commented 1 year ago

@JmillsExpensify, @NikkiWines, @thesahindia Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

MelvinBot commented 1 year ago

@JmillsExpensify, @NikkiWines, @thesahindia Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

JmillsExpensify commented 1 year ago

We just aligned in the linked Slack discussion. @thesahindia I think we're good to go on this one, thanks for keeping it moving.

allroundexperts commented 1 year ago

@thesahindia Updated my proposal which works even better.

MelvinBot commented 1 year ago

@JmillsExpensify @NikkiWines @thesahindia this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

thesahindia commented 1 year ago

Will review it soon! (just got back, completing the pending work right now)

thesahindia commented 1 year ago

@allroundexperts, after implementing your proposal when I send a link with strikethrough it shows as normal text.

Screenshot 2023-02-15 at 3 42 55 AM
thesahindia commented 1 year ago

@marktoman Emails that start with ~ are valid. Example: ~firstemail@gmail.com is valid.

@s77rt, I am not sure about it. Slack or github doesn't support these emails. I also tried creating an email with ~ and it wasn't allowed

Screenshot 2023-02-15 at 4 11 36 AM
s77rt commented 1 year ago

@thesahindia They may have their own reasons not to parse it but per the RFC 5322 it's valid. If we consider emails that contain (or better; that start/end with) _*~ an edge case then we can simply omit them.

allroundexperts commented 1 year ago

@thesahindia I've updated my proposal to handled your mentioned case.

allroundexperts commented 1 year ago

@thesahindia If we can ignore ~ in the email, then the solution becomes quite simple. We just need to replace the regex in autoEmail with:

                regex: /(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*)([a-zA-Z0-9.!#$%&'*+/=?^_`{|}-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim,
thesahindia commented 1 year ago

Hey @NikkiWines, should we take ~abc@xyz.com as a valid email? I am seeing that slack, GitHub and WhatsApp aren't recognising this as a valid email. Even google doesn't allow creating an email with ~

allroundexperts commented 1 year ago

Also, IMO, we shouldn't be going with RFC format for the email here. ~ in the email isn't allowed by big players which means that eventually RFC will change as well. Allowing for ~ in the email adds a lot of complexity to our regex.

JmillsExpensify commented 1 year ago

@NikkiWines Thoughts on the path forward?

s77rt commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

Unable to apply italic/bold/strikethrough styles to multi line text that starts with an email.

What is the root cause of that problem?

The italic/bold/strikethrough styling delimiters _*~ are being captured by the email regex.

What changes do you think we should make in order to solve the problem?

Prevent the styling delimiters from being captured by the email regex. This can be achieved easily if we consider emails that start with those characters as invalid emails (although technically they are valid). This is a trade-off that's also adopted by Slack.

Moving forward with this approach would mean to modify the autoEmail rule as follow:

What alternative solutions did you explore? (Optional)

None that are worth mentioning


Credits go to @marktoman The above regex only fixes a tiny regression (a@test.com was no being recognised as a valid email). Technical details and code-diffs are discouraged but since this was brought up before the new contribution rules and to help avoid going with the wrong regex. I'm posting this comment. Please hire @marktoman for that solution.

Apologises if this is a redundant comment.

NikkiWines commented 1 year ago

Hey @NikkiWines, should we take ~abc@xyz.com as a valid email? I am seeing that slack, GitHub and WhatsApp aren't recognising this as a valid email. Even google doesn't allow creating an email with ~

Sorry for the delay here @thesahindia. I agree with what you've said above. I don't think we should accept emails beginning with ~ as valid. There's no compelling reason to allow it, especially if other major platforms don't support emails with that format.

thesahindia commented 1 year ago

I like @s77rt's proposal because it supports ~, *, _ in the middle of email.

Screenshot 2023-02-17 at 9 49 14 PM

Credits go to @marktoman The above regex only fixes a tiny regression (a@test.com was no being recognised as a valid email). Technical details and code-diffs are discouraged but since this was brought up before the new contribution rules and to help avoid going with the wrong regex. I'm posting this comment. Please hire @marktoman for that solution.

@s77rt, mentioned that they have improved @marktoman's proposal and addressed a regression ( I think it could have been missed, thanks @s77rt )so I think @s77rt' proposal is not a dupe and it's valid. I am not sure who should be assigned here. It's just my thought but maybe we can split the compensation if they both agree

C+ reviewed 🎀👀🎀

cc: @NikkiWines

NikkiWines commented 1 year ago

@thesahindia, agreed that @s77rt's proposal looks good. @JmillsExpensify, not sure how we'd handle the compensation here given the collaborative efforts here. What are your thoughts?

marktoman commented 1 year ago

@thesahindia It looks like you were testing my first regex. The second one supports those cases. The tiny regression for ~a@b.com~ that @s77rt mentions is fixable by replacing + with * here:

/(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*?)([a-zA-Z0-9.!#$%&'+/=?^`{|}-][a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim
                                                                                                  ^

But I found problems when the email contains the character in multiline.

~abc~def@gmail.com
abc~def@gmail.com~

Fix (replace here):

+ regex: /\B~((?=\S)((~~(?!~)|[^\s~]|\s(?!~)|\S~{1,2}\S)+?))~\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,

The fix extends the allowed sequences for ~ preceded and followed by a non-whitespace character. It allows up to 2 consecutive ~ characters, because the current rule allows that, but prevents 3. When editing these rules, my intention is to honor the current rule set as much as possible to avoid unforeseen regressions.

*abc*def@gmail.com
abc*def@gmail.com*

Fix (replace here):

+ regex: /\B\*((?=\S)(([^\s*]|\s(?!\*)|\S\*+\S)+?))\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,

Similarly to the previous rule, it honors the ruleset while extending the allowed sequences for an inline * that is not delimited by a space.


Update

_a_b@b.com
a_b@b.com_

Fix (replace here)

+ regex: /(?!_blank")\b_((?=\S)[\s\S]+?)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,

This is a technical fix that solves a regex feature called greediness and does not change the intended behavior.

With all the rules in place, this is supported:

_~*a_*~b@b.com
a~*_b@b.com*~_
JmillsExpensify commented 1 year ago

Going to hold on commenting so that @thesahindia can reply to @marktoman's most recent comment first.

thesahindia commented 1 year ago

@marktoman, the page becomes unresponsive in similar cases like the one shown below

Screenshot 2023-02-18 at 3 49 22 PM

I think the formatting for emails like ~abc~abc@gmail.com~ is out of scope for this issue. But let's confirm first if we wanna include them in this issue.

@NikkiWines, this is how the above email will look -

Screenshot 2023-02-18 at 4 16 48 PM

Should we add support for the similar cases as shown above or just should we just focus at the issue mentioned in the OP?

The new regex that @marktoman' has suggested will fix main issue and will not cause the regression that @s77rt mentioned. So we can move forward with it. We just have to change the autoEmail regex to

/(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*?)([a-zA-Z0-9.!#$%&'+/=?^`{|}-][a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]*@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim
marktoman commented 1 year ago

@thesahindia Nice catch, I've fixed the regexes. But I see how it can be considered out of scope. These emails are not common and it makes sense to keep the possibility of unforeseen regressions low even if there is an effort to mitigate them.

JmillsExpensify commented 1 year ago

I agree with @marktoman that these emails should be considered out of scope.

thesahindia commented 1 year ago

Cool, then we can move forward with @marktoman's proposal to only fix the current issue. @marktoman, would you mind posting your proposal (only with the necessary change) as a new comment so that @NikkiWines can also look at it ?