[HOLD for Payment 2023-09-20][$2000] Can't parse deep link with email param.

kavimuru commented 1 year ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!

Action Performed:

Try to open user detail from deep link on browser (for example: https://staging.new.expensify.com/details?login=testing@gmail.com).
Now go to any report, paste that link and click send.
Notice that it can't parse the whole link, and when user try to click on it, it opens the email app with the wrong email filled.

Expected Result:

App should parse the deep link detail with email param.

Actual Result:

App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.

Workaround:

Can the user still use Expensify without this being fixed? Have you informed them of the workaround?

Platforms:

Which of our officially supported platforms is this issue occurring on?

[ ] Android / native
[ ] Android / Chrome
[ ] iOS / native
[ ] iOS / Safari
[x] MacOS / Chrome / Safari
[ ] MacOS / Desktop

Version Number: 1.2.92-0 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation

https://user-images.githubusercontent.com/43996225/228922658-ef1627e6-9f3e-4f91-a52a-8cb26a5ea47f.mov

Expensify/Expensify Issue URL: Issue reported by: @hungvu193 Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1680191331185739

View all open jobs on GitHub

Upwork Automation - Do Not Edit

Upwork Job URL: https://www.upwork.com/jobs/~013659e025d6ce0e4f
Upwork Job ID: 1644482548992663552
Last Price Increase: 2023-06-26
Automatic offers:

0xmiroslav | Reviewer | 26524537
Antasel | Contributor | 26524539
hungvu193 | Reporter | 26524541

MelvinBot commented 1 year ago

Triggered auto assignment to @sakluger (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

MelvinBot commented 1 year ago

Bug0 Triage Checklist (Main S/O)

[x] This "bug" occurs on a supported platform (ensure Platforms in OP are ✅)
[x] This bug is not a duplicate report (check E/App issues and #expensify-bugs)
- If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
[x] This bug is reproducible using the reproduction steps in the OP. S/O
- If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
- If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
[x] This issue is filled out as thoroughly and clearly as possible
- Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
[x] I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync

MelvinBot commented 1 year ago

@sakluger Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

MelvinBot commented 1 year ago

@sakluger Huh... This is 4 days overdue. Who can take care of this?

MelvinBot commented 1 year ago

@sakluger Still overdue 6 days?! Let's take care of this!

sakluger commented 1 year ago

I was able to reproduce this issue, feels like a bug worth solving!

MelvinBot commented 1 year ago

Job added to Upwork: https://www.upwork.com/jobs/~013659e025d6ce0e4f

MelvinBot commented 1 year ago

Current assignee @sakluger is eligible for the External assigner, not assigning anyone new.

MelvinBot commented 1 year ago

Triggered auto assignment to Contributor-plus team member for initial proposal review - @0xmiroslav (External)

MelvinBot commented 1 year ago

Triggered auto assignment to @johnmlee101 (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

alitoshmatov commented 1 year ago

I couldn't reproduce this issue. I am not sure how exactly you are copying the url. Can you provide more details, or elaborate on copying step a little bit more

akinwale commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

Deep links which contain an email address are incorrectly parsed thereby resulting in broken links in the chat history.

What is the root cause of that problem?

The ExpensiMark parser is processing the autoEmail rule before the autoLink rule, which results in a broken link, because the email part of the deep link is parsed first, hence the full link will never be recognised. The currently defined order is like so: https://github.com/Expensify/expensify-common/blob/main/lib/ExpensiMark.js#L95-L130

What changes do you think we should make in order to solve the problem?

Update the autoEmail rule like so:

{
    name: 'autoEmail',
    regex: new RegExp(
        `(?![^<]*>|[^<>]*<\\/)(^| |(?<!http(s?):\\S*))${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
        'gim',
    ),
    replacement: (match, g1, g2, g3) => {
        let repl = `<a href="mailto:${g3}">${g3}</a>`;
        if (match.startsWith(' ')) {
            repl = ' ' + repl;
        }
        return repl;
    }
}

This will make sure that it only matches an email that starts at the beginning of the line, or if there's a space preceding the email, thereby excluding emails that are already in links. The second capture group contains the matching email, so we use that instead of match, which may include a space. The replacement string is conditionally updated to be prefixed with a space based on the match.

This has been tested and works well with the following scenarios:

test@expensify.com https://www.expensify.com
bruh test@expensify.com https://www.expensify.com
https://www.expensify.com test@expensify.com
https://staging.new.expensify.com/details?login= testing@gmail.com bacon antoher@gmail.com
multiple spaces https://www.expensify.com    test@expensify.com
[test ]](test@expensify.com)
another scenario [test@expensify.com]
checking boundaries \test@expensify.com
https://staging.new.expensify.com/details?name=test&email= testing@gmail.com
final@expensify.com last check
testing@gmail.com google.com@gmail.com tests fine
https://www.expensify.com?name=test&email=testing@gmail.com
random@gmail.com https://www.expensify.com?name=test&email=testing@gmail.com random2@gmail.com

Please note that the faulty parsed HTML result (with the broken links) occurs at the composer end, before the data is sent to the API, so previous messages that were already malformed will still remain the same way even after updating the ExpensiMark class implementation. A user would have to edit these messages after the changes have been applied in order to fix the broken links.

What alternative solutions did you explore? (Optional)

None.

hoangzinh commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and the rest is in mailto protocol => we link incorrectly.

What is the root cause of that problem?

Context: We're using an internal library to parse a message from text to html in this line

The issue described in GH issue happens only when users send a plain-text url that contain email as a param. For more details, with a plain-text url string, when it's passed into our parser, it will reach this parser's rule autoEmail. As described in description, this rule tries to "automatically links emails that are not in a link". But it only works with HTML tag but not raw url string.

i.e Given input is https://staging.new.expensify.com/details?login=testing@gmail.com => matching text of this rule will be //staging.new.expensify.com/details?login=testing@gmail.com => output after this rule is <a href="mailto://staging.new.expensify.com/details?login=testing@gmail.com">//staging.new.expensify.com/details?login=testing@gmail.com</a> => it leads to the out of whole parser is https:<a href="mailto://staging.new.expensify.com/details?login=testing@gmail.com">//staging.new.expensify.com/details?login=testing@gmail.com</a>

What changes do you think we should make in order to solve the problem?

To fix this issue, we need to:

Update the regex rule of autoEmail so that it includes optional substring regex ((?:\\S*=)?)
Update replacement of autoEmail rule so that, if the matching string is an URL, we don't need to format to an email link.

Code example

Replace those [LOC](https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L100-L107) by ```js { name: 'autoEmail', regex: new RegExp( `(?![^<]*>|[^<>]*<\\/)((?:\\S*=)?)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`, 'gim' ), replacement: (match, g1, g2) => { const urlRegex = new RegExp(`^${URL_REGEX}$`, 'gi'); if (g1 && urlRegex.test(match)) { return match; } return `${g1 || ''}${g2}`; } } ```

tienifr commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.

What is the root cause of that problem?

The root cause is here https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L101 where we're parsing the autoEmail before the autoLink.

So when we're trying to parse https://staging.new.expensify.com/details?login=testing@gmail.com, it will parse //staging.new.expensify.com/details?login=testing@gmail.com as an email. Then when we try to apply the autolink, it will not work because the <a> tags are already inserted by the autoEmail rule.

The ordering of autoEmail and autolink are like that for a reason, to avoid the auto link rule to apply to a part of the email address.

Let's consider this email google.com@gmail.com, if the autolink rule is before the autoEmail rule, the google.com will become an auto link and it will not be parsed as an email.

What changes do you think we should make in order to solve the problem?

We need to do 3 things:

Put the auto link rule before the autoEmail rule, this can be done simply by reodering this rule https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L101 and this rule https://github.com/Expensify/expensify-common/blob/3590b7371fe8fa73288a6625daad985e7fcfb057/lib/ExpensiMark.js#L114
Address this issue.

The ordering of autoEmail and autolink are like that for a reason, to avoid the auto link rule to apply to a part of the email address.

For this, we can modify the URL_WEBSITE_REGEX here https://github.com/Expensify/expensify-common/blob/f37b3bc00d1275f232c7ddb8bbcb5b6f69310918/lib/Url.js#L3 such that it will not match if the domain is followed by an email signature (a.k.a @domain.tld). This can be done by updating this rule https://github.com/Expensify/expensify-common/blob/f37b3bc00d1275f232c7ddb8bbcb5b6f69310918/lib/Url.js#L3 to

const URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}?((?:www\\.)?[a-z0-9](?:[-a-z0-9]*[a-z0-9])?\\.)+(?:${TLD_REGEX})(?!@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+)(?:\\:${ALLOWED_PORTS}|\\b|(?=_))`;

(note the lookahead (?!@[a-zA-Z0-9-]+?(\.[a-zA-Z]+)+) to match the @domain.tld signature, it's the same regex we're using to match that part of an autoEmail)

Edited: Looks like this has been recently fixed in https://github.com/Expensify/expensify-common/pull/534 There's a hidden bug in here that makes the code abort all instances of matching links if the first match has @. This is because if abort is set to true once, it will persist on the next iteration of the matching links.

In current staging app, we can also easily reproduce it by sending the below

⁦@expensify.com 
https://www.expensify.com

The link https://www.expensify.com will not be detected.

To fix this we need to move the let abort = false; inside the while loop here https://github.com/Expensify/expensify-common/blob/3cdaa947fe77016206c15e523017cd50678f2359/lib/ExpensiMark.js#L372 so it can be reset to false when processing the next match.

What alternative solutions did you explore? (Optional)

We can improve the regex in 2 if there're special edge cases that need to be addressed (special characters, ...), for example by adding
```
[a-zA-Z0-9.!$%&'*+=^_`{|}~-]
```

to match the email-compatible part between the matched domain and the @domain.tld signature, but the main point is the same. If an accidental URL that belongs to an email address is not parsed as an auto link, it will be safe to reorder the rules.

FYI the approach to avoid replacing the autoEmail if the matched string is an URL will not work if the matched email string is not the full URL but only a part of the URL (For example an URL with the & character)

ahmedGaber93 commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

if we parse link with email param, it parased as email not link.

What is the root cause of that problem?

the REGEXP of autoEmail allow to parse email in link.

What changes do you think we should make in order to solve the problem?

edit REGEXP of autoEmail to be match optional ((?:\\S*[=/])?) and use it in replacement case like this.

{
    name: 'autoEmail',
    regex: new RegExp(
        `(?![^<]*>|[^<>]*<\\/)((?:\\S*[=/])?)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
        'gim',
    ),
    replacement: (match, g1, g2) => {
        // this condition will return true if g1 is link/auto link which mean this email is part of link/auto link.
        // we can also use URL_REGEX and MARKDOWN_URL_REGEX like this
        // if(g1 match URL_REGEX or g1 match MARKDOWN_URL_REGEX) return match;
        if (Str.isValidURL(Str.sanitizeURL(g1))) {
            // if email is part of link or auto link skip parsing.
            return match;
        }
        return g1 + `<a href="mailto:${g2}">${g2}</a>`;
    },
},

Explanation

RegExp will match two groups ((?:\\S*[=/])?) and ${CONST.REG_EXP.MARKDOWN_EMAIL}.
we will use group1 to determined if the matched email (group2) is part of link or not. 2.1 if group1 is found, we need to confirm it is a link and not a normal text like chars "-" or ":" 2.2 we will use Str.isValidURL or URL_REGEX to confirm the group1 is a link. 2.3 in case group1 is auto link, we will use Str.sanitizeURL to convert it to link, then check if it is a link or not Str.isValidURL(Str.sanitizeURL(g1)).
if we confirm group1 is a link or auto link, skip parse auto email return match;
if group1 is not a link, return group1 + parsed email return g1 + `<a href="mailto:${g2}">${g2}</a>`;

all test passed in expensify-common in this comment https://github.com/Expensify/App/issues/16762#issuecomment-1598810396

What alternative solutions did you explore? (Optional)

N/A

anaskhraza commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

App can't parse the deep link detail with email param. The URL prefix (https://) is not part of the link, and we instead link incorrectly.

What is the root cause of that problem?

1 - When the user tries to send a deeplink with email as param, the parser rule that we have defined in the library rule autoEmail should automatically links the email. However our current rule can only work with HTML tags not the raw URL string.

2 - The placement of autoemail rule is also contributing to the problem, as it's been placed before autolink.

What changes do you think we should make in order to solve the problem?

Step 1 - Update the regex rule to prevent match emails if string contains http or https

Step 2 - Reorder autoEmail rule to check if the matching string is a URL, we will use URL_REGEX and don't need to format to an email link

What alternative solutions did you explore? (Optional)

none

MelvinBot commented 1 year ago

📣 @anaskhraza! 📣

Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:

Get the email address used to login to your Expensify account. If you don't already have an Expensify account, create one here. If you have multiple accounts (e.g. one for testing), please use your main account email.
Get the link to your Upwork profile. It's necessary because we only pay via Upwork. You can access it by logging in, and then clicking on your name. It'll look like this. If you don't already have an account, sign up for one here.
Copy the format below and paste it in a comment on this issue. Replace the placeholder text with your actual details.

Screen Shot 2022-11-16 at 4 42 54 PM

Format:

Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>

anaskhraza commented 1 year ago

Contributor details Your Expensify account email: m.khurshid1991@outlook.com Upwork Profile Link: https://www.upwork.com/freelancers/~013888b99767abfe1e

MelvinBot commented 1 year ago

✅ Contributor details stored successfully. Thank you for contributing to Expensify!

MelvinBot commented 1 year ago

@johnmlee101, @sakluger, @0xmiroslav Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

sakluger commented 1 year ago

Hey @0xmiroslav @johnmlee101 how do the proposals look so far?

MelvinBot commented 1 year ago

@johnmlee101 @sakluger @0xmiroslav this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

MelvinBot commented 1 year ago

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

johnmlee101 commented 1 year ago

@0xmiroslav Do you mind reviewing the proposals please? 🙇

0xmiroslav commented 1 year ago

Proposals are still in review. Testing various cases since the proposed solutions might break already working auto-email, auto-links easily, which means regression.

MelvinBot commented 1 year ago

@johnmlee101, @sakluger, @0xmiroslav Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

0xmiroslav commented 1 year ago

still in review

MelvinBot commented 1 year ago

@johnmlee101 @sakluger @0xmiroslav this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!

0xmiroslav commented 1 year ago

Expect an update today

MelvinBot commented 1 year ago

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

johnmlee101 commented 1 year ago

@0xmiroslav Whats the latest here?

sakluger commented 1 year ago

@0xmiroslav can you please give us an update on the progress of your review? If there are issues with any of the proposals, it would help to know so that we could get updated or new proposals.

0xmiroslav commented 1 year ago

I will post my reviews per each proposal today.

0xmiroslav commented 1 year ago

Thanks for proposals everyone. Sorry for delay but hard time finding & testing all possible cases.

Here's my feedback so far:

autoLink over autoEmail approach doesn't work this case: (@akinwale's proposal and @tienifr's proposal)
```
test@expensify.com https://www.expensify.com
```
autoEmail ignoring approach doesn't work this case: (@hoangzinh's proposal, @ahmedGaber93's proposal)
```
https://www.expensify.com?name=test&email=test@expensify.com
```

0xmiroslav commented 1 year ago

@johnmlee101 @sakluger No satisfactory proposals yet

akinwale commented 1 year ago

Proposal

Updated

@0xmiroslav Please have a look at my updated proposal. Thanks.

0xmiroslav commented 1 year ago

@akinwale I still feel it's workaround, case-by-case fix. Your updated solution will fail this case:

[test ]](test@expensify.com)

Email should be parsed.

ahmedGaber93 commented 1 year ago

@0xmiroslav autoEmail ignoring works fine for this url https://www.expensify.com?name=test&email=test@expensify.com try it here

akinwale commented 1 year ago

@akinwale I still feel it's workaround, case-by-case fix. Your updated solution will fail this case:
[test ]](test@expensify.com)
Email should be parsed.

I forgot to add word boundaries in my update, which I actually had in my implementation. I have updated my proposal which should handle this case properly. The regex should now cover all scenarios properly.

Is there a list of standard test cases that we can test against?

0xmiroslav commented 1 year ago

@ahmedGaber93 your regex causes various regressions so still 👎. Please test this:

test@expensify.com https://www.expensify.com
https://www.expensify.com?name=test&email=test@expensify.com

ahmedGaber93 commented 1 year ago

@0xmiroslav If I understand you well, I see it works fine for two cases. It only matches the email out link.

Screenshot 2023-04-26 at 12 04 03 PM

0xmiroslav commented 1 year ago

@ahmedGaber93 did you try to send message with your solution?

Please thoroughly test all possible cases before posting updated proposal.

akinwale commented 1 year ago

@0xmiroslav

Here's the updated regex.

`(?![^<]*>|[^<>]*<\\/)(^| |\\b)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`

Test scenarios. All testable as individual lines or as a single comment.

test@expensify.com https://www.expensify.com
bruh test@expensify.com https://www.expensify.com
https://staging.new.expensify.com/details?login= testing@gmail.com bacon antoher@gmail.com
multiple spaces https://www.expensify.com    test@expensify.com
[test ]](test@expensify.com)
[testing](test@expensify.com)
another scenario [test@expensify.com]
checking boundaries \test@expensify.com
https://staging.new.expensify.com/details?name=test&email= testing@gmail.com
final@expensify.com last check

0xmiroslav commented 1 year ago

@akinwale [test ]](test@expensify.com) still doesn't work for me

Proposal doesn't need to be instant. If you're not confident, please test enough for a few days and then submit.

akinwale commented 1 year ago

@akinwale [test ]](test@expensify.com) still doesn't work for me

Proposal doesn't need to be instant. If you're not confident, please test enough for a few days and then submit.

That's odd. My regex replacement does not remove the email match at all. Please note that you have to make use of the replacement method as shown in my proposal, in addition to the new regex.

So you should have the following before testing:

{
    name: 'autoEmail',
    regex: new RegExp(
        `(?![^<]*>|[^<>]*<\\/)(^| |\\b)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`,
        'gim',
    ),
    replacement: (match, g1, g2) => {
        let rep = `<a href="mailto:${g2}">${g2}</a>`;
        if (match.startsWith(' ')) {
            rep = ' ' + rep;
        }
        return rep;
    }
}

I have recorded a video showing a demo.

16762-demo.webm

0xmiroslav commented 1 year ago

@akinwale after applying your solution:

https://user-images.githubusercontent.com/97473779/234556319-f46d51ca-bc9c-4723-b53b-27fe4a0b968f.mov

This already happens on main but should be fixed here.

akinwale commented 1 year ago

@akinwale after applying your solution:

Screen.Recording.2023-04-26.at.1.02.19.PM.mov This already happens on main but should be fixed here.

For this case, I'd have to change the regex to also ignore http:// or https:// if there's no space between the text and the actual email address. I'll post an update after I'm done testing.

tienifr commented 1 year ago

autoLink over autoEmail approach doesn't work this case: (@akinwale's https://github.com/Expensify/App/issues/16762#issuecomment-1500744733 and @tienifr's https://github.com/Expensify/App/issues/16762#issuecomment-1500807823)

Proposal updated

See the 3rd point in What changes do you think we should make in order to solve the problem?

@0xmiroslav this is an existing (hidden) bug on staging with a different root cause. Do we want to fix it here or report it as another bug?

In current staging app, we can also easily reproduce it by sending the below

⁦@expensify.com 
https://www.expensify.com

This is not a regression from the change suggested, the change just helps uncover the bug 😁. If we send an email followed by a link, currently it will not happen because the email will be processed already by the auto email rule. But we can reproduce it by the above case.

akinwale commented 1 year ago

Proposal

Updated

@0xmiroslav Updated proposal with the final rule and test cases.

Please note that in addition to the regex, the replacement method has also changed, handling 3 capture groups instead of 2.

MelvinBot commented 1 year ago

@johnmlee101 @sakluger @0xmiroslav this issue is now 4 weeks old and preventing us from maintaining WAQ, can you:

Decide whether any proposals currently meet our guidelines and can be approved as-is today
If no proposals meet that standard, please take this issue internal and treat it as one of your highest priorities
If you have any questions, don't hesitate to start a discussion in #expensify-open-source

Thanks!

Expensify / App