Closed kavimuru closed 1 year ago
Alright, I'm able to reproduce this issue. Given that it's a markdown parsing issues I'm going to open up to External
now.
Job added to Upwork: https://www.upwork.com/jobs/~0133c0accb82a12340
Current assignee @JmillsExpensify is eligible for the External assigner, not assigning anyone new.
Triggered auto assignment to Contributor-plus team member for initial proposal review - @thesahindia (External
)
Triggered auto assignment to @NikkiWines (External
), see https://stackoverflow.com/c/expensify/questions/7972 for more details.
I should also note, one outstanding question is whether multi-line markdown should be supported. Clearly the issue right now is that we do support it, we just support it inconsistently. That said, I think it's equally valid to question, should we stick to markdown within one line, before the break? I can only imagine this is one of several edge cases we're going to face because we allow multi-line.
@JmillsExpensify/ @NikkiWines - I have identified the issue -
In AnchorRenderer.js component, for the first email we are getting 1 child (native HTML span) in props.tnode
and subsequently getting textDecorationLine: "line-through"
style in nativeTextRet object, while in case of second email we don't get any child. I will draft the proposal soon.
The regular pattern does not separate ~ from the first email because it matches it erroneously with the email text.
Match them separately by removing ~ and related from the email pattern and keeping them in the first and second groups.
Remove the line below: https://github.com/Expensify/expensify-common/blob/aaf49708420187a13ebe3bdb97b67104e6a6bcfc/lib/ExpensiMark.js#L99 and add:
+ regex: /(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*)([a-zA-Z0-9.!#$%&'+/=?^`{|}-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim,
I didn't realize these are technically valid email characters. The solution is to make sure the email doesn't start with them and allow them inside:
+ regex: /(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*?)([a-zA-Z0-9.!#$%&'+/=?^`{|}-][a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim,
Fix the italic rule by allowing the same range of the valid characters as the bold rule, while keeping the invalid character specific to the italic rule.
Replace https://github.com/Expensify/expensify-common/blob/aaf49708420187a13ebe3bdb97b67104e6a6bcfc/lib/ExpensiMark.js#L134 with:
+ regex: /(?!_blank")\b_((?=\S)(([^\s*]|\s)+?))_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
Another way to deal with emails is to let the italic/bold/strikethrough rule run before the autoEmail rule. This supports the emails starting with ~_*
and multiline at the same time.
In this case, the autoEmail regex remains the same. The italic rule still needs to be fixed, but that is unrelated to autoEmail.
@marktoman - Actually, the issue is that email is not getting formatted like the normal url is getting. If you check modifyTextForUrlLinks
function, there we are adding ['bold', 'strikethrough', 'italic'] styles for the normal link.. but same isn't done for email. As a result we get different html for link and email -
For Email - <a href="mailto:~firstemail@gmail.com"><del>firstemail@gmail.com</del></a><br /><a href="mailto:secondemail@gmail.com">secondemail@gmail.com</a>
For Link - <del><a href="https://firstwebsite.com" target="_blank" rel="noreferrer noopener">firstwebsite.com</a><br /><a href="https://secondwebsite.com" target="_blank" rel="noreferrer noopener">secondwebsite.com</a></del>
Please note the difference in <del>
tag for both the cases
@mirraashid Yes, this is the effect of the regex putting ~
inside <a>~...
instead of before. It is subsequently replaced by <del>
, but that doesn't change its position.
I think we can combine https://github.com/Expensify/App/issues/14675 in this issue.
cc: @NikkiWines @JmillsExpensify
I've updated the proposal. Now all uses of ~*_
around the regular text or emails are supported for multiline. I should mention that the original expressions can be refactored to be cleared while retaining their tested functionality.
I should also note, one outstanding question is whether multi-line markdown should be supported. Clearly the issue right now is that we do support it, we just support it inconsistently. That said, I think it's equally valid to question, should we stick to markdown within one line, before the break? I can only imagine this is one of several edge cases we're going to face because we allow multi-line.
Yeah, interesting point @JmillsExpensify. FWIW, neither Slack nor Whatsapp support multiline strikethrough. Github does support it. Personally, I agree that this'll probably open up several edge-case scenarios for a pretty niche function. I think strikethrough should be delineated by a ~ on either end of sentence or phrase, without a space or newline to break it.
@NikkiWines Currently, multiline strikethrough is not properly ignored or supported for emails, but it is supported for regular text and links. Reverting the support across the board may open more problems, given what the users are used to.
Given that our app is still largely beta, we don't need to worry too much about what users are used to. That said, this is a good one to bring out to discussion in our Slack community. I'll circle back on that and CC those in this issue I can find using similar usernames to GH.
@JmillsExpensify Thanks for clarifying. Do you benchmark against specific apps or any noteworthy ones? E.g., Notion supports multiline. Slack doesn't, but it at least lets the user select the lines and make them bold/italic/strikethrough via the toolbar.
@marktoman Emails that start with ~
are valid.
Example: ~firstemail@gmail.com
is valid. However with your regex this email will turn into another email firstemail@gmail.com
@s77rt I know and I mentioned it in those two solutions, but this is not the case with the third one.
@marktoman Sorry my bad, I thought you meant not to support strike-through for emails that starts with those characters but the emails would remain as they are. (i.e. we won't be able to strike-though ~firstemail@gmail.com
but the mailto
target would still be ~firstemail@gmail.com
)
I have tried your latest solution (changing order of rules) it does seem to fix the issue however it fails parsing ~~firstemail@gmail.com~
(the mailto
target should be ~firstemail@gmail.com
)
The issue arises due to inconsistent HTML being generated for the emails. The email
rule in the formatter runs first, creating an html like:
<a href="~...">~...</a><br/><a href="...">...</a>~
When the strikethrough
rule runs, it takes the above HTML and strikethrough's the first email (with the ~ character removed inside the anchor text but present in the href
section. The generated HTML after this rule looks like:
<a href="~..."><del>...</a><br/><a href="...">...</a></del>
Moving the strikethrough rule above email would cause the auto link to break. Thus we can not use it.
The second solution here https://github.com/Expensify/App/issues/14695#issuecomment-1411090815 does not result in correct href
value of the anchor tag (Ignores ~
no matter how many ~
are added to the email).
We can fix the inconsistency for STRIKETHROUGH (as mentioned in the bug) by changing the following rule:
https://github.com/Expensify/expensify-common/blob/main/lib/ExpensiMark.js#L97-L101
to
{
name: 'autoEmail',
process: (textToProcess, replacement) => {
const [match, c1, rawText] = textToProcess.match(/([~]?)([\S\s]+)\1/);
const regex = /(?![^<]*>|[^<>]*<\\)([_*]*)([_*]*)([a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim;
if (regex.test(rawText)) {
return `${c1 ? '<del>': ''}${rawText.replace(regex, replacement)}${c1 ? '</del>': ''}`;
}
return textToProcess;
},
replacement: (match, g1, g2, g3) => `${g1}${g2}<a href="mailto:${g3}">${g3}</a>${g2}${g1}`,
},
That said, this is a good one to bring out to discussion in our Slack community
Is there any discussion going @JmillsExpensify ?
Ah no, do you mind getting that going? I'm working on some other R&D projects at the moment and I haven't been able to circle back around to this.
@JmillsExpensify Thanks for clarifying. Do you benchmark against specific apps or any noteworthy ones? E.g., Notion supports multiline. Slack doesn't, but it at least lets the user select the lines and make them bold/italic/strikethrough via the toolbar.
@marktoman great question. I would say not particularly, though Slack and Discord are good examples to include in the conversation, especially since they tend to differ in their approaches.
Started the thread. Feel free to share your opinions.
Hey @JmillsExpensify can you please bump the discussion? We still need some opinions.
Done, thank you!
@JmillsExpensify, @NikkiWines, @thesahindia Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
@JmillsExpensify, @NikkiWines, @thesahindia Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
We just aligned in the linked Slack discussion. @thesahindia I think we're good to go on this one, thanks for keeping it moving.
@thesahindia Updated my proposal which works even better.
@JmillsExpensify @NikkiWines @thesahindia this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!
Will review it soon! (just got back, completing the pending work right now)
@allroundexperts, after implementing your proposal when I send a link with strikethrough it shows as normal text.
@marktoman Emails that start with ~ are valid. Example: ~firstemail@gmail.com is valid.
@s77rt, I am not sure about it. Slack or github doesn't support these emails. I also tried creating an email with ~
and it wasn't allowed
@thesahindia They may have their own reasons not to parse it but per the RFC 5322 it's valid.
If we consider emails that contain (or better; that start/end with) _*~
an edge case then we can simply omit them.
@thesahindia I've updated my proposal to handled your mentioned case.
@thesahindia If we can ignore ~
in the email, then the solution becomes quite simple. We just need to replace the regex in autoEmail
with:
regex: /(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*)([a-zA-Z0-9.!#$%&'*+/=?^_`{|}-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim,
Hey @NikkiWines, should we take ~abc@xyz.com
as a valid email? I am seeing that slack, GitHub and WhatsApp aren't recognising this as a valid email. Even google doesn't allow creating an email with ~
Also, IMO, we shouldn't be going with RFC format for the email here. ~
in the email isn't allowed by big players which means that eventually RFC will change as well. Allowing for ~
in the email adds a lot of complexity to our regex.
@NikkiWines Thoughts on the path forward?
Unable to apply italic/bold/strikethrough styles to multi line text that starts with an email.
The italic/bold/strikethrough styling delimiters _*~
are being captured by the email regex.
Prevent the styling delimiters from being captured by the email regex. This can be achieved easily if we consider emails that start with those characters as invalid emails (although technically they are valid). This is a trade-off that's also adopted by Slack.
Moving forward with this approach would mean to modify the autoEmail rule as follow:
(match, g1) => `<a href="mailto:${g1}">${g1}</a>`
None that are worth mentioning
Credits go to @marktoman The above regex only fixes a tiny regression (a@test.com was no being recognised as a valid email). Technical details and code-diffs are discouraged but since this was brought up before the new contribution rules and to help avoid going with the wrong regex. I'm posting this comment. Please hire @marktoman for that solution.
Apologises if this is a redundant comment.
Hey @NikkiWines, should we take ~abc@xyz.com as a valid email? I am seeing that slack, GitHub and WhatsApp aren't recognising this as a valid email. Even google doesn't allow creating an email with ~
Sorry for the delay here @thesahindia. I agree with what you've said above. I don't think we should accept emails beginning with ~
as valid. There's no compelling reason to allow it, especially if other major platforms don't support emails with that format.
I like @s77rt's proposal because it supports ~
, *
, _
in the middle of email.
Credits go to @marktoman The above regex only fixes a tiny regression (a@test.com was no being recognised as a valid email). Technical details and code-diffs are discouraged but since this was brought up before the new contribution rules and to help avoid going with the wrong regex. I'm posting this comment. Please hire @marktoman for that solution.
@s77rt, mentioned that they have improved @marktoman's proposal and addressed a regression ( I think it could have been missed, thanks @s77rt )so I think @s77rt' proposal is not a dupe and it's valid. I am not sure who should be assigned here. It's just my thought but maybe we can split the compensation if they both agree
C+ reviewed 🎀👀🎀
cc: @NikkiWines
@thesahindia, agreed that @s77rt's proposal looks good. @JmillsExpensify, not sure how we'd handle the compensation here given the collaborative efforts here. What are your thoughts?
@thesahindia It looks like you were testing my first regex. The second one supports those cases. The tiny regression for ~a@b.com~
that @s77rt mentions is fixable by replacing +
with *
here:
/(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*?)([a-zA-Z0-9.!#$%&'+/=?^`{|}-][a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim
^
But I found problems when the email contains the character in multiline.
~abc~def@gmail.com
abc~def@gmail.com~
Fix (replace here):
+ regex: /\B~((?=\S)((~~(?!~)|[^\s~]|\s(?!~)|\S~{1,2}\S)+?))~\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
The fix extends the allowed sequences for ~
preceded and followed by a non-whitespace character. It allows up to 2 consecutive ~
characters, because the current rule allows that, but prevents 3.
When editing these rules, my intention is to honor the current rule set as much as possible to avoid unforeseen regressions.
*abc*def@gmail.com
abc*def@gmail.com*
Fix (replace here):
+ regex: /\B\*((?=\S)(([^\s*]|\s(?!\*)|\S\*+\S)+?))\*\B(?![^<]*(<\/pre>|<\/code>|<\/a>))/g,
Similarly to the previous rule, it honors the ruleset while extending the allowed sequences for an inline *
that is not delimited by a space.
_a_b@b.com
a_b@b.com_
Fix (replace here)
+ regex: /(?!_blank")\b_((?=\S)[\s\S]+?)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
This is a technical fix that solves a regex feature called greediness and does not change the intended behavior.
With all the rules in place, this is supported:
_~*a_*~b@b.com
a~*_b@b.com*~_
Going to hold on commenting so that @thesahindia can reply to @marktoman's most recent comment first.
@marktoman, the page becomes unresponsive in similar cases like the one shown below
I think the formatting for emails like ~abc~abc@gmail.com~
is out of scope for this issue. But let's confirm first if we wanna include them in this issue.
@NikkiWines, this is how the above email will look -
Should we add support for the similar cases as shown above or just should we just focus at the issue mentioned in the OP?
The new regex that @marktoman' has suggested will fix main issue and will not cause the regression that @s77rt mentioned.
So we can move forward with it. We just have to change the autoEmail
regex to
/(?![^<]*>|[^<>]*<\\)([_*~]*)([_*~]*?)([a-zA-Z0-9.!#$%&'+/=?^`{|}-][a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]*@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)\2\1(?![^<]*(<\/pre>|<\/code>|<\/a>))/gim
@thesahindia Nice catch, I've fixed the regexes. But I see how it can be considered out of scope. These emails are not common and it makes sense to keep the possibility of unforeseen regressions low even if there is an effort to mitigate them.
I agree with @marktoman that these emails should be considered out of scope.
Cool, then we can move forward with @marktoman's proposal to only fix the current issue. @marktoman, would you mind posting your proposal (only with the necessary change) as a new comment so that @NikkiWines can also look at it ?
If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Action Performed:
Expected Result:
All of the text (both email and website) should be on strikethrough.
Actual Result:
While the website is all on strikethrough, the second email (secondemail@gmail.com) doesn't got strikethrough.
Workaround:
unknown
Platforms:
Which of our officially supported platforms is this issue occurring on?
Version Number: v1.2.62-1 Reproducible in staging?: Y Reproducible in production?: Y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos:
https://user-images.githubusercontent.com/43996225/215822124-98b2cef9-b40b-48e3-b903-12ea230db46b.mp4
https://user-images.githubusercontent.com/43996225/215822312-63ac8273-dc70-4fae-aed8-d1d9c674faff.mp4
Expensify/Expensify Issue URL: Issue reported by: @kerupuksambel Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1675115210540239
View all open jobs on GitHub
Upwork Automation - Do Not Edit