Closed kavimuru closed 8 months ago
Triggered auto assignment to @sakluger (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details.
Platforms
in OP are ✅)@sakluger Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
@sakluger Huh... This is 4 days overdue. Who can take care of this?
@sakluger Still overdue 6 days?! Let's take care of this!
I was able to reproduce this issue, feels like a bug worth solving!
Job added to Upwork: https://www.upwork.com/jobs/~013659e025d6ce0e4f
Current assignee @sakluger is eligible for the External assigner, not assigning anyone new.
Triggered auto assignment to Contributor-plus team member for initial proposal review - @0xmiroslav (External
)
Triggered auto assignment to @johnmlee101 (External
), see https://stackoverflow.com/c/expensify/questions/7972 for more details.
I couldn't reproduce this issue. I am not sure how exactly you are copying the url. Can you provide more details, or elaborate on copying step a little bit more
Deep links which contain an email address are incorrectly parsed thereby resulting in broken links in the chat history.
The ExpensiMark parser is processing the autoEmail
rule before the autoLink
rule, which results in a broken link, because the email part of the deep link is parsed first, hence the full link will never be recognised.
The currently defined order is like so: https://github.com/Expensify/expensify-common/blob/main/lib/ExpensiMark.js#L95-L130
Update the autoEmail
rule like so:
{
name: 'autoEmail',
regex: new RegExp(
`(?![^<]*>|[^<>]*<\\/)(^| |(?<!http(s?):\\S*))${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
'gim',
),
replacement: (match, g1, g2, g3) => {
let repl = `<a href="mailto:${g3}">${g3}</a>`;
if (match.startsWith(' ')) {
repl = ' ' + repl;
}
return repl;
}
}
This will make sure that it only matches an email that starts at the beginning of the line, or if there's a space preceding the email, thereby excluding emails that are already in links. The second capture group contains the matching email, so we use that instead of match, which may include a space. The replacement string is conditionally updated to be prefixed with a space based on the match.
This has been tested and works well with the following scenarios:
test@expensify.com https://www.expensify.com
bruh test@expensify.com https://www.expensify.com
https://www.expensify.com test@expensify.com
https://staging.new.expensify.com/details?login= testing@gmail.com bacon antoher@gmail.com
multiple spaces https://www.expensify.com test@expensify.com
[test ]](test@expensify.com)
another scenario [test@expensify.com]
checking boundaries \test@expensify.com
https://staging.new.expensify.com/details?name=test&email= testing@gmail.com
final@expensify.com last check
testing@gmail.com google.com@gmail.com tests fine
https://www.expensify.com?name=test&email=testing@gmail.com
random@gmail.com https://www.expensify.com?name=test&email=testing@gmail.com random2@gmail.com
Please note that the faulty parsed HTML result (with the broken links) occurs at the composer end, before the data is sent to the API, so previous messages that were already malformed will still remain the same way even after updating the ExpensiMark class implementation. A user would have to edit these messages after the changes have been applied in order to fix the broken links.
None.
App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and the rest is in mailto protocol => we link incorrectly.
Context: We're using an internal library to parse a message from text to html in this line
The issue described in GH issue happens only when users send a plain-text url that contain email as a param. For more details, with a plain-text url string, when it's passed into our parser, it will reach this parser's rule autoEmail. As described in description, this rule tries to "automatically links emails that are not in a link". But it only works with HTML tag but not raw url string.
i.e Given input is https://staging.new.expensify.com/details?login=testing@gmail.com
=> matching text of this rule will be //staging.new.expensify.com/details?login=testing@gmail.com
=> output after this rule is <a href="mailto://staging.new.expensify.com/details?login=testing@gmail.com">//staging.new.expensify.com/details?login=testing@gmail.com</a>
=> it leads to the out of whole parser is https:<a href="mailto://staging.new.expensify.com/details?login=testing@gmail.com">//staging.new.expensify.com/details?login=testing@gmail.com</a>
To fix this issue, we need to:
((?:\\S*=)?)
replacement
of autoEmail rule so that, if the matching string is an URL, we don't need to format to an email link.App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.
The root cause is here https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L101 where we're parsing the autoEmail
before the autoLink
.
So when we're trying to parse https://staging.new.expensify.com/details?login=testing@gmail.com
, it will parse //staging.new.expensify.com/details?login=testing@gmail.com
as an email. Then when we try to apply the autolink
, it will not work because the <a>
tags are already inserted by the autoEmail
rule.
The ordering of autoEmail
and autolink
are like that for a reason, to avoid the auto link rule to apply to a part of the email address.
Let's consider this email google.com@gmail.com
, if the autolink
rule is before the autoEmail
rule, the google.com
will become an auto link and it will not be parsed as an email.
We need to do 3 things:
Put the auto link rule before the autoEmail rule, this can be done simply by reodering this rule https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L101 and this rule https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L114
Address this issue.
The ordering of
autoEmail
andautolink
are like that for a reason, to avoid the auto link rule to apply to a part of the email address.
For this, we can modify the URL_WEBSITE_REGEX
here https://github.com/Expensify/expensify-common/blob/f37b3bc00d1275f232c7ddb8bbcb5b6f69310918/lib/Url.js#L3 such that it will not match if the domain is followed by an email signature (a.k.a @domain.tld
). This can be done by updating this rule https://github.com/Expensify/expensify-common/blob/f37b3bc00d1275f232c7ddb8bbcb5b6f69310918/lib/Url.js#L3
to
const URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}?((?:www\\.)?[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\\.)+(?:${TLD_REGEX})(?!@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)(?:\\:${ALLOWED_PORTS}|\\b|(?=_))`;
(note the lookahead (?!@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)
to match the @domain.tld
signature, it's the same regex we're using to match that part of an autoEmail
)
Edited: Looks like this has been recently fixed in https://github.com/Expensify/expensify-common/pull/534
There's a hidden bug in here that makes the code abort all instances of matching links if the first match has @
. This is because if abort
is set to true once, it will persist on the next iteration of the matching links.
In current staging app, we can also easily reproduce it by sending the below
@expensify.com
https://www.expensify.com
The link https://www.expensify.com
will not be detected.
To fix this we need to move the let abort = false;
inside the while
loop here https://github.com/Expensify/expensify-common/blob/3cdaa947fe77016206c15e523017cd50678f2359/lib/ExpensiMark.js#L372 so it can be reset to false
when processing the next match.
2
if there're special edge cases that need to be addressed (special characters, ...), for example by adding
[a-zA-Z0-9.!$%&'*+=^_`{|}~-]
to match the email-compatible part between the matched domain and the @domain.tld
signature, but the main point is the same. If an accidental URL that belongs to an email address is not parsed as an auto link, it will be safe to reorder the rules.
FYI the approach to avoid replacing the autoEmail if the matched string is an URL will not work if the matched email string is not the full URL but only a part of the URL (For example an URL with the &
character)
if we parse link with email param, it parased as email not link.
the REGEXP of autoEmail
allow to parse email in link.
edit REGEXP of autoEmail
to be match optional ((?:\\S*[=/])?)
and use it in replacement case like this.
{
name: 'autoEmail',
regex: new RegExp(
`(?![^<]*>|[^<>]*<\\/)((?:\\S*[=/])?)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
'gim',
),
replacement: (match, g1, g2) => {
// this condition will return true if g1 is link/auto link which mean this email is part of link/auto link.
// we can also use URL_REGEX and MARKDOWN_URL_REGEX like this
// if(g1 match URL_REGEX or g1 match MARKDOWN_URL_REGEX) return match;
if (Str.isValidURL(Str.sanitizeURL(g1))) {
// if email is part of link or auto link skip parsing.
return match;
}
return g1 + `<a href="mailto:${g2}">${g2}</a>`;
},
},
((?:\\S*[=/])?)
and ${CONST.REG_EXP.MARKDOWN_EMAIL}
.Str.isValidURL
or URL_REGEX
to confirm the group1 is a link.
2.3 in case group1 is auto link, we will use Str.sanitizeURL
to convert it to link, then check if it is a link or not Str.isValidURL(Str.sanitizeURL(g1))
.return match;
return g1 + `<a href="mailto:${g2}">${g2}</a>`;
all test passed in expensify-common in this comment https://github.com/Expensify/App/issues/16762#issuecomment-1598810396
N/A
App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.
1 - When the user tries to send a deeplink with email as param, the parser rule that we have defined in the library rule autoEmail should automatically links the email. However our current rule can only work with HTML tags not the raw URL string.
2 - The placement of autoemail rule is also contributing to the problem, as it's been placed before autolink.
Step 1 - Update the regex rule to prevent match emails if string contains http or https
Step 2 - Reorder autoEmail rule to check if the matching string is a URL, we will use URL_REGEX and don't need to format to an email link
none
📣 @anaskhraza! 📣
Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:
Format:
Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>
Contributor details Your Expensify account email: m.khurshid1991@outlook.com Upwork Profile Link: https://www.upwork.com/freelancers/~013888b99767abfe1e
✅ Contributor details stored successfully. Thank you for contributing to Expensify!
@johnmlee101, @sakluger, @0xmiroslav Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
Hey @0xmiroslav @johnmlee101 how do the proposals look so far?
@johnmlee101 @sakluger @0xmiroslav this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!
📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸
@0xmiroslav Do you mind reviewing the proposals please? 🙇
Proposals are still in review. Testing various cases since the proposed solutions might break already working auto-email, auto-links easily, which means regression.
@johnmlee101, @sakluger, @0xmiroslav Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
still in review
@johnmlee101 @sakluger @0xmiroslav this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!
Expect an update today
📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸
@0xmiroslav Whats the latest here?
@0xmiroslav can you please give us an update on the progress of your review? If there are issues with any of the proposals, it would help to know so that we could get updated or new proposals.
I will post my reviews per each proposal today.
Thanks for proposals everyone. Sorry for delay but hard time finding & testing all possible cases.
Here's my feedback so far:
autoLink over autoEmail approach doesn't work this case: (@akinwale's proposal and @tienifr's proposal)
test@expensify.com https://www.expensify.com
autoEmail ignoring approach doesn't work this case: (@hoangzinh's proposal, @ahmedGaber93's proposal)
https://www.expensify.com?name=test&email=test@expensify.com
@johnmlee101 @sakluger No satisfactory proposals yet
@akinwale I still feel it's workaround, case-by-case fix. Your updated solution will fail this case:
[test ]](test@expensify.com)
Email should be parsed.
@0xmiroslav
autoEmail ignoring works fine for this url
https://www.expensify.com?name=test&email=test@expensify.com
try it here
@akinwale I still feel it's workaround, case-by-case fix. Your updated solution will fail this case:
[test ]](test@expensify.com)
Email should be parsed.
I forgot to add word boundaries in my update, which I actually had in my implementation. I have updated my proposal which should handle this case properly. The regex should now cover all scenarios properly.
Is there a list of standard test cases that we can test against?
@ahmedGaber93 your regex causes various regressions so still 👎. Please test this:
test@expensify.com https://www.expensify.com
https://www.expensify.com?name=test&email=test@expensify.com
@0xmiroslav If I understand you well, I see it works fine for two cases. It only matches the email out link.
@ahmedGaber93 did you try to send message with your solution?
Please thoroughly test all possible cases before posting updated proposal.
@0xmiroslav
Here's the updated regex.
`(?![^<]*>|[^<>]*<\\/)(^| |\\b)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`
Test scenarios. All testable as individual lines or as a single comment.
test@expensify.com https://www.expensify.com
bruh test@expensify.com https://www.expensify.com
https://staging.new.expensify.com/details?login= testing@gmail.com bacon antoher@gmail.com
multiple spaces https://www.expensify.com test@expensify.com
[test ]](test@expensify.com)
[testing](test@expensify.com)
another scenario [test@expensify.com]
checking boundaries \test@expensify.com
https://staging.new.expensify.com/details?name=test&email= testing@gmail.com
final@expensify.com last check
@akinwale [test ]](test@expensify.com)
still doesn't work for me
Proposal doesn't need to be instant. If you're not confident, please test enough for a few days and then submit.
@akinwale
[test ]](test@expensify.com)
still doesn't work for meProposal doesn't need to be instant. If you're not confident, please test enough for a few days and then submit.
That's odd. My regex replacement does not remove the email match at all. Please note that you have to make use of the replacement
method as shown in my proposal, in addition to the new regex.
So you should have the following before testing:
{
name: 'autoEmail',
regex: new RegExp(
`(?![^<]*>|[^<>]*<\\/)(^| |\\b)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
'gim',
),
replacement: (match, g1, g2) => {
let rep = `<a href="mailto:${g2}">${g2}</a>`;
if (match.startsWith(' ')) {
rep = ' ' + rep;
}
return rep;
}
}
I have recorded a video showing a demo.
@akinwale after applying your solution:
This already happens on main but should be fixed here.
@akinwale after applying your solution:
Screen.Recording.2023-04-26.at.1.02.19.PM.mov This already happens on main but should be fixed here.
For this case, I'd have to change the regex to also ignore http:// or https:// if there's no space between the text and the actual email address. I'll post an update after I'm done testing.
autoLink over autoEmail approach doesn't work this case: (@akinwale's https://github.com/Expensify/App/issues/16762#issuecomment-1500744733 and @tienifr's https://github.com/Expensify/App/issues/16762#issuecomment-1500807823)
See the 3rd point in What changes do you think we should make in order to solve the problem?
@0xmiroslav this is an existing (hidden) bug on staging with a different root cause. Do we want to fix it here or report it as another bug?
In current staging app, we can also easily reproduce it by sending the below
@expensify.com
https://www.expensify.com
This is not a regression from the change suggested, the change just helps uncover the bug 😁. If we send an email followed by a link, currently it will not happen because the email will be processed already by the auto email rule. But we can reproduce it by the above case.
@0xmiroslav Updated proposal with the final rule and test cases.
Please note that in addition to the regex
, the replacement
method has also changed, handling 3 capture groups instead of 2.
@johnmlee101 @sakluger @0xmiroslav this issue is now 4 weeks old and preventing us from maintaining WAQ, can you:
Thanks!
If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Action Performed:
Expected Result:
App should parse the deep link detail with email param.
Actual Result:
App can't parse the deep link detail with email param. The URL prefix (
https://
) is not part of the link, and we instead link incorrectly.Workaround:
Can the user still use Expensify without this being fixed? Have you informed them of the workaround?
Platforms:
Which of our officially supported platforms is this issue occurring on?
Version Number: 1.2.92-0 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation
https://user-images.githubusercontent.com/43996225/228922658-ef1627e6-9f3e-4f91-a52a-8cb26a5ea47f.mov
Expensify/Expensify Issue URL: Issue reported by: @hungvu193 Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1680191331185739
View all open jobs on GitHub
Upwork Automation - Do Not Edit