Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
2.99k stars 2.5k forks source link

[HOLD for payment 2023-09-07] [$4000] Inconsistent front-end email validation across the product #17387

Closed kavimuru closed 8 months ago

kavimuru commented 1 year ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

(Chat)

  1. Create a new chat
  2. Enter an invalid email (e.g. jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd
  3. Start the chat

(Login/SignUp)

  1. Try to login with an invalid email like jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd

(Add Contact Method)

  1. Navigate to Settings > Profile > Contact Method > New Contact Method
  2. Try to add an invalid email like jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd

Expected result:

Front-end validation should prevent these emails from being added and creating unnecessary network requests. Additionally, consistent validation should be used in all areas where we validate emails in the product (e.g. pasting an invalid email into an existing chat still formats it like a valid email)

Backend email validation uses PHP's email validation filter FILTER_VALIDATE_EMAIL as well as the following regex (for emails and phone numbers) [\w.'#%+-]+@[a-zA-Z\d.-]+.[a-zA-Z]{2,}|^+?[1-9]\d{1,14}$)

Actual results:

Emails are successfully submitted, only to be subsequently rejected by the backend

Workaround:

N/A

Platforms:

Which of our officially supported platforms is this issue occurring on?

Version Number: v1.3.0-0 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation

https://user-images.githubusercontent.com/43996225/231782900-50ff85fb-d63d-481a-b754-6265979f3f68.mp4

Untitled

Expensify/Expensify Issue URL: Issue reported by: @chiragxarora Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1681358100970919

View all open jobs on GitHub

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~01878e05ad15a14027
  • Upwork Job ID: 1648410114624286720
  • Last Price Increase: 2023-08-10
  • Automatic offers:
    • aimane-chnaif | Reviewer | 26133961
    • jjcoffee | Contributor | 26133964
MelvinBot commented 1 year ago

Triggered auto assignment to @adelekennedy (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

MelvinBot commented 1 year ago

Bug0 Triage Checklist (Main S/O)

MelvinBot commented 1 year ago

@adelekennedy Whoops! This issue is 2 days overdue. Let's get this updated quick!

adelekennedy commented 1 year ago

I can reproduce! Seems very similar to #17388

MelvinBot commented 1 year ago

Job added to Upwork: https://www.upwork.com/jobs/~01878e05ad15a14027

MelvinBot commented 1 year ago

Current assignee @adelekennedy is eligible for the External assigner, not assigning anyone new.

MelvinBot commented 1 year ago

Triggered auto assignment to Contributor-plus team member for initial proposal review - @aimane-chnaif (External)

MelvinBot commented 1 year ago

Triggered auto assignment to @NikkiWines (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

chiragxarora commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

Invalid email validation in chats, leading to the wrong strings markdowns being transformed to email anchor tags by using mailto: protocol.

After the transformation, when we click the link rendered in chats, it also opens that invalid email in external email app.

What is the root cause of that problem?

Root cause of the problem is wrong regex applied in expensify's helper library called Expensify-common. Here the regex expressions are defined in the file lib/CONST.jsx/ as

https://github.com/Expensify/expensify-common/blob/777177937a77a0edfea5bdba2252eff35da9f9db/lib/CONST.jsx#L333

Since a valid email can have at most 254 characters, this regex failed to put a limit on length of total string which is causing the problem

What changes do you think we should make in order to solve the problem?

We should add a clause to the regex above which can limit the total length of valid email. We can do this by adding this additional condition (?=.{1,254}$)

After adding the length condition:

image

Before that(in production right now):

image

Thankyou!

EDIT: grammar PS: I'm sorry if something's off in the detailing of proposal, this is my first proposal to Expensify

jjcoffee commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

We want email validation to be consistent across the app and between FE and BE.

What is the root cause of that problem?

We are using two different sets of regex patterns on the FE, neither of which fully matches the behaviour of the BE. The one that is used when validating emails when starting a new chat can be traced back to EMAIL_BASE_REGEX:

https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/CONST.jsx#L3

and then there's the markdown one here, which eventually is used to converts emails to links in-chat:

https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/CONST.jsx#L333

This duplicating of regexes appears to be a result of this PR.

What changes do you think we should make in order to solve the problem?

I think it makes sense to have one master rule which fully matches BE behaviour, which would be used both for starting chats and for figuring out when a string in a message needs to be converted to an email link. This just keeps things simple going forwards, but I can see there could be an argument for looser constraints for the chat markdown conversion (as it's less critical if it gets it wrong).

The BE uses a combination of FILTER_VALIDATE_EMAIL and additional regex ([\w\.'#%+-]+@[a-zA-Z\d\.-]+\.[a-zA-Z]{2,}) that is more restrictive than the PHP filter in terms of which characters it accepts.

If we look at the regex of FILTER_VALIDATE_EMAIL extracted from PHP source, we can see that it does a mixture of character validation and various lengths. To summarise it will validate:

We don't need to care about the details of the filter's character validation as the additional regex the BE uses is more restrictive (e.g. BE won't accept "{" as a valid character, whereas the PHP filter would), so we just want to apply the BE's additional regex when it comes to which characters we accept.

In order to match all the above behaviour, I think we should update EMAIL_BASE_REGEX to this:

New regex (?=((?=[\\w'#%+-]+(?:\\.[\\w'#%+-]+)*@)[\\w\\.'#%+-]{1,64}@(?:(?=[a-z\\d]+(?:-+[a-z\\d]+)*\\.)(?:[a-zA-Z\\d-]{1,63}\\.)+[a-zA-Z]{2,63})(?= |_|\\b))(?<end>.*))\\S{3,254}(?=\\k<end>$)

This combines the length limits in FILTER_VALIDATE_EMAIL and the character limits in the extra BE regex (as well as the TLD being >2 length), so this will fully match the BE's behaviour as described above.

The MARKDOWN_EMAIL constant should be updated to refer to the base regex: MARKDOWN_EMAIL: EMAIL_BASE_REGEX (or it can be removed and references updated to EMAIL_BASE_REGEX).

The autoEmail rule needs to be updated to this:

(?![^<]*>|[^<>]*<\\/)(?:(?<= |[^\\w'#%+-])|^)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))

This ensures that the length count starts at either whitespace, start of line or non-email character. Without this you'd get the part of the email within the length limit converted and the rest left as plain text.

The BE considers _example@test.com a valid email which conflicts a bit with markdown italic underscores, so dealing with that requires an extra processing step. See below for the fix.

Show fix for emails with leading underscores It's easy enough to handle simple cases like `_example@test.com_` (just match the email within the underscores by consuming any matching start/end underscores), but it gets more complex for multi-line mixed examples, e.g. ``` _email@test.com is my main email, you can also try _email2@test.com or: _email3@test.com_ ``` where it becomes less clear which parts should be italicised and we can't simply handle this in one regex lookup. The easiest solution is just not to accept leading underscores in markdown since it's really rare to actually see an email starting with an underscore. Slack gets around this by just ignoring leading underscores, for example: ![image](https://github.com/Expensify/App/assets/27287420/19b33a04-7de5-4049-ba12-f47debbe1b08) However if we want to fix it, we need to add a `process` callback to the email rule, like this: ``` process: (textToProcess, replacement) => { const emailRegex = new RegExp( `(?![^<]*>|[^<>]*<\\/)(?:\\b)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))`, 'gim' ); return this.modifyTextForEmails(emailRegex, textToProcess, replacement); }, replacement: '$1', ``` This callback will run the italic markdown rule to find any italic markdown underscores, and then run the email replace rule *within* the italics, before running the email replace on the rest of the text. We can move the existing markdown italic regex into a constant at the top of `ExpensiMark.js`: `const MARKDOWN_ITALICS_REGEX = new RegExp(/(\b_+|\b)(?!_blank")_((?![\s_])[\s\S]*?[^\s_])_(?![^\W_])(?![^<]*(<\/pre>|<\/code>|<\/a>|<\/mention-user>|_blank))/g);` Then the callback function is: ``` modifyTextForEmails(emailRegex, textToCheck, replacement) { // Replace emails within italic markdown underscores first const italicRegex = MARKDOWN_ITALICS_REGEX; let match = italicRegex.exec(textToCheck); let replacedText = ''; let startIndex = 0; while (match !== null) { // `match[1]` contains additionally captured leading underscores which are valid email chars // `match[2]` contains the text within the underscores (markdown italics). const italicText = `${match[1]}${match[2]}`; // We only run the email replace if there are emails within the markdown italics. if (emailRegex.test(italicText)) { // MARKDOWN_ITALICS_REGEX captures any extra leading underscores, so we compare match group lengths to // determine the index of the first underscore, taking into account the closing underscore. const offset = match.index + (match[0].length - italicText.length - 1); replacedText = replacedText.concat(textToCheck.substr(startIndex, (offset - startIndex))); const emailText = italicText.replace(emailRegex, replacement); replacedText = replacedText.concat(emailText); startIndex = match.index + (italicText.length + 1); } else { replacedText = replacedText.concat(textToCheck.substr(startIndex, ((match.index + match[0].length) - startIndex))); startIndex = match.index + (match[0].length); } // Now we move to the next match that the js regex found in the text match = italicRegex.exec(textToCheck); } if (startIndex < textToCheck.length) { replacedText = replacedText.concat(textToCheck.substr(startIndex)); } // Now we run the normal email regex on the rest replacedText = replacedText.replace(emailRegex, replacement); return replacedText; } ```

Tests

All the following test suite emails can be validated against the PHP filter + BE regex in this quick PHP script I threw together.

Fixes to existing test suites

There are several tests that need to be removed as they test for emails that the BE considers invalid. The following email patterns need to be removed from this test:

There are also several tests that include the following invalid emails in the same file:

We would also need to modify the result of this test as _abc@gmail.com would be considered a valid email by BE whereas def@gmail.com_ is not so the correct result is:

result = '<a href="mailto:_abc@gmail.com">text</a><br />'
        + '[text](<a href="mailto:def@gmail.com">def@gmail.com</a>_)';

This test also needs to be updated toBeFalsy as $test@gmail.com is an invalid email according to the BE.

New test cases

Several new tests need to be added to Str-test.js. These can replace Str.isValidEmailMarkdown tests as these two functions run exactly the same regex now. Proposed tests are below. They can be validated against PHP + the BE's regex in this PHP script.

Show Str-test.js new tests ``` describe('Str.isValidEmail', () => { it('Correctly identifies valid emails', () => { expect(Str.isValidEmail('abc@gmail.com')).toBeTruthy(); // Domain length (63 chars in each label) expect(Str.isValidEmail('test@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com')).toBeTruthy(); expect(Str.isValidEmail('abc@abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.km')).toBeTruthy(); expect(Str.isValidEmail('abc@co.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.km')).toBeTruthy(); // Address length (64 chars) expect(Str.isValidEmail('sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@test.com')).toBeTruthy(); // Overall length (254 chars) expect(Str.isValidEmail('averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com')).toBeTruthy(); // Domain with lots of dashes expect(Str.isValidEmail('sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asj-j-s-sjdjdjdjd-jdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdke.com.ab.net.aa.bb.cc.dd.ee')).toBeTruthy(); expect(Str.isValidEmail('abc@g---m--ai-l.com')).toBeTruthy(); // Domain with repeated labels of 63 chars expect(Str.isValidEmail('test@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.com')).toBeTruthy(); // TLD >=2 chars expect(Str.isValidEmail('abc@gmail.co')).toBeTruthy(); expect(Str.isValidEmail('a@a.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijk')).toBeTruthy(); // Very short address expect(Str.isValidEmail('a@example.com')).toBeTruthy(); // xn-- style domain name expect(Str.isValidEmail('test@xn--diseolatinoamericano-76b.com')).toBeTruthy(); // Unusual but valid prefixes expect(Str.isValidEmail('-test@example.com')).toBeTruthy(); expect(Str.isValidEmail('_test@example.com')).toBeTruthy(); expect(Str.isValidEmail('#test@example.com')).toBeTruthy(); expect(Str.isValidEmail('test.+123@example.com')).toBeTruthy(); expect(Str.isValidEmail('-test-@example.com')).toBeTruthy(); // Invalid chars expect(Str.isValidEmail('$test@gmail.com')).toBeFalsy(); expect(Str.isValidEmail('!test@gmail.com')).toBeFalsy(); expect(Str.isValidEmail('"test"@gmail.com')).toBeFalsy(); expect(Str.isValidEmail('~abc@gmail.com~')).toBeFalsy(); expect(Str.isValidEmail('abc@gmail.com~')).toBeFalsy(); expect(Str.isValidEmail('test@example_123site.com')).toBeFalsy(); expect(Str.isValidEmail('test{@example.com')).toBeFalsy(); expect(Str.isValidEmail('test..new@example.com')).toBeFalsy(); expect(Str.isValidEmail('test@example-.a.com')).toBeFalsy(); expect(Str.isValidEmail('test@example......a.com')).toBeFalsy(); // Invalid period location expect(Str.isValidEmail('.test@example.com')).toBeFalsy(); expect(Str.isValidEmail('.test.new@example.com')).toBeFalsy(); expect(Str.isValidEmail('test.@example.com')).toBeFalsy(); // Domain too long (>63 chars in each label) expect(Str.isValidEmail('test@averylongdomainpartoftheemailthatwillgooverthelimitasitismorethan63chars.com')).toBeFalsy(); expect(Str.isValidEmail('abc@abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890a.km')).toBeFalsy(); expect(Str.isValidEmail('abc@co.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890a.km')).toBeFalsy(); // Address too long (>64 chars) expect(Str.isValidEmail('averylongaddresspartoftheemailthatwillgovoerthelimitasitismorethan64chars@example.com')).toBeFalsy(); // Overall length too long expect(Str.isValidEmail('sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com.a.aa.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjasfffa')).toBeFalsy(); // Incorrect domains start/end expect(Str.isValidEmail('test@example-.com')).toBeFalsy(); expect(Str.isValidEmail('test@-example-.com')).toBeFalsy(); // TLD too short expect(Str.isValidEmail('test@example.a')).toBeFalsy(); // TLD too long expect(Str.isValidEmail('a@a.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijkl')).toBeFalsy(); }); }); ```

We can also add similar tests to ExpensiMark-HTML-test.js, as well as a new test for the more complex italic markdown underscore case mentioned above:

Show ExpensiMark-HTML-test.js new tests ``` // Check emails within other markdown test('Test emails within other markdown', () => { const testString = '> test@example.com\n' + '```test@example.com```\n' + '`test@example.com`\n' + '_test@example.com_ ' + '__test@example.com_'; const result = '
test@example.com
' + '
test@example.com
' + 'test@example.com
' + 'test@example.com ' + '_test@example.com'; expect(parser.replace(testString)).toBe(result); }); // Check email regex's validity at various limits test('Test markdown replacement for valid emails', () => { const testString = 'A simple email: abc@gmail.com, ' + 'or a very short one a@example.com ' + 'hitting the maximum domain length (63 chars) test@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com ' + 'or the maximum address length (64 chars) sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@test.com ' + 'overall length of 254 averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com ' + 'domain with many dashes sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asj-j-s-sjdjdjdjd-jdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdke.com.ab.net.aa.bb.cc.dd.ee ' + ' how about a domain with repeated labels of 63 chars test@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.com ' + 'max length email with italics ' + '_averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com_ ' + ' xn-- style domain test@xn--diseolatinoamericano-76b.com ' + 'or a more complex case where we need to determine where to apply italics markdown ' + '_email@test.com\n_email2@test.com\n\nemail3@test.com_ ' + 'some unusual, but valid prefixes -test@example.com ' + ' and _test@example.com ' + 'a max length email enclosed in brackets ' + '(averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com) ' + 'max length email with ellipsis ending ' + 'averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com... ' + 'try a markdown link with a valid max-length email ' + '[text](sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com.a.aa.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjasfff)'; const result = 'A simple email: abc@gmail.com, ' + 'or a very short one a@example.com ' + 'hitting the maximum domain length (63 chars) test@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com ' + 'or the maximum address length (64 chars) sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@test.com ' + 'overall length of 254 averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com ' + 'domain with many dashes sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asj-j-s-sjdjdjdjd-jdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdke.com.ab.net.aa.bb.cc.dd.ee ' + ' how about a domain with repeated labels of 63 chars test@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekasgasgasgasgashfnfn.com ' + 'max length email with italics ' + 'averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com ' + ' xn-- style domain test@xn--diseolatinoamericano-76b.com ' + 'or a more complex case where we need to determine where to apply italics markdown ' + 'email@test.com
' + '_email2@test.com

' + 'email3@test.com
' + 'some unusual, but valid prefixes -test@example.com ' + ' and _test@example.com ' + 'a max length email enclosed in brackets ' + '(averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com) ' + 'max length email with ellipsis ending ' + 'averylongaddresspartthatalmostwillreachthelimitofcharsperaddress@nowwejustneedaverylongdomainpartthatwill.reachthetotallengthlimitforthewholeemailaddress.whichis254charsaccordingtothePHPvalidate-email-filter.extendingthetestlongeruntilwereachtheright.com... ' + 'try a markdown link with a valid max-length email ' + 'text'; expect(parser.replace(testString)).toBe(result); }); test('Test markdown replacement for invalid emails', () => { const testString = 'Replace the valid email part within a string ' + '.test@example.com ' + '$test@gmail.com ' + 'test..new@example.com ' + 'Some chars that are not allowed within emails ' + 'test{@example.com ' + 'domain length over limit test@averylongdomainpartoftheemailthatwillgooverthelimitasitismorethan63chars.com ' + 'address over limit averylongaddresspartoftheemailthatwillgovoerthelimitasitismorethan64chars@example.com ' + 'overall length too long ' + 'sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com.a.aa.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjasfffa ' + 'invalid domain start/end ' + 'test@example-.com ' + 'test@-example-.com ' + 'test@example.a ' + '[text](sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com.a.aa.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjasfffa)'; const result = 'Replace the valid email part within a string ' + '.test@example.com ' + '$test@gmail.com ' + 'test..new@example.com ' + 'Some chars that are not allowed within emails ' + 'test{@example.com ' + 'domain length over limit test@averylongdomainpartoftheemailthatwillgooverthelimitasitismorethan63chars.com ' + 'address over limit averylongaddresspartoftheemailthatwillgovoerthelimitasitismorethan64chars@example.com ' + 'overall length too long ' + 'sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com.a.aa.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjasfffa ' + 'invalid domain start/end ' + 'test@example-.com ' + 'test@-example-.com ' + 'test@example.a ' + '[text](sjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjab@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.com.a.aa.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcj.asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjasfffa)'; expect(parser.replace(testString)).toBe(result); }); ```
chiragxarora commented 1 year ago

Hi @jjcoffee , You're right about the duplicate regex rules and those two can be combined into one which is being used in the backend (if it is being used). I believe probability of a user coming in and typing a very long email like string is not much but having a limit on length makes our app more consistent with any modern day chat application in the market and also RFC compliant at the same time. Again possibility of someone entering a very long character email like string is low but if we render it as a link which opens in an email app, that would reflect as overall bad aesthetics and validation. Since we cannot accurately validate the TLDs without the DNS lookup, the least we can do is validation on length part on frontend.

aimane-chnaif commented 1 year ago

I agree that it's inconsistency issue to validate emails differently in various parts (login, chat, add contact, search, etc) with different regexes. I think we should also match validations between frontend and backend. @NikkiWines can you please share all email validation regexes used in backend along with where they're used?

abdulrahuman5196 commented 1 year ago

I am coming from the PR which introduced MARKDOWN_EMAIL regex.

I am not sure if understand the problem statement correctly or what is trying to be fixed here.

1) The problem statement expects "should not parse the email address as a valid one for abc@abc.abc". 2) And the proposal and slack conversation is talking about limiting email length to RFC standard. 3) And we are also checking to make common regex for emails(this was also talked in the original PR which introduced MARKDOWN_EMAIL).

I am confused here, Are we trying to solve all 3 above or something specifically? Could you kindly provide this information so I can provide better information with regards to the original MARKDOWN_EMAIL regex. @aimane-chnaif

aimane-chnaif commented 1 year ago

@abdulrahuman5196 abc@abc.abc is valid email. What the issue mentioned is very long email:

long email

@chiragxarora as you're a reporter, can you share example invalid email here so that other can copy&paste and reproduce easily?

chiragxarora commented 1 year ago

Sure @aimane-chnaif

Jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjeJajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjeJajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd

aimane-chnaif commented 1 year ago

Jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjeJajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjeJajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd

ok, so I tried with this email on various parts - login, search, new contact all are valid in frontend but all are invalid in backend. So it makes sense to fix this inconsistency here. cc: @NikkiWines (waiting for result on https://github.com/Expensify/App/issues/17387#issuecomment-1517894210)

NikkiWines commented 1 year ago

Sorry for the delay here! Couple things looking through this issue.

Enter any invalid email having atleast 3 parts abc@abc.abc(can extend from here)

.abc (and .om from the testing screenshots) are both valid TLDs, so some of the emails in the OP that we're saying are invalid are actually valid. .ddddd and .cdjd don't seem to be valid though.

FWIW I don't know how useful it is to impose an email length limit here as we're not really in the business of fully validating the email (for example we still accept incorrect TLDs, like .example), we just want to turn any reasonably email-looking string into a link. If someone wants to deliberately type an incredibly long email, they're free to do it - as long as we're not expecting it to validate as a real email on the BE. (The same is true of the https://github.com/Expensify/App/issues/17388 that was closed).

Yep, I agree with all of this and can confirm it's not successfully passing on the BE

@NikkiWines can you please share all email validation regexes used in backend along with where they're used?

From some local testing I can tell you that it's failing PHP's email validation filter FILTER_VALIDATE_EMAIL. Also I don't think you need the usage, as the parsing for new Expensify should be handled in Expensimark.js.

chiragxarora commented 1 year ago

Hi @NikkiWines correct me if I'm wrong but I think validation on domain names including TLD's is not really possible since any organization can register for their custom domains and they get added constantly. What we were trying to do was adding a validation check on length of the email address because that's what possible on frontend. To accurately validate a domain name I don't think we have any option other than dns lookup..

NikkiWines commented 1 year ago

correct me if I'm wrong but I think validation on domain names including TLD's is not really possible since any organization can register for their custom domains and they get added constantly.

No, you're correct, and I'm not suggesting we move to validating emails based on TLDs - just mentioning that the emails used in the OP aren't actually invalid (at the very least, the abc@abc.abc shouldn't be)

NikkiWines commented 1 year ago

In this case, it doesn't seem that there's a unified standard for handle these extra long emails, I think our current formatting (which seems to be the same as Githubs right now) is fine. @adelekennedy, @aimane-chnaif thoughts?

Here are some other examples:

jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd

abc@abc.abc

jajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejd@gmail.com

adelekennedy commented 1 year ago

@NikkiWines that seems fine to me - there doesn't really seem to be a standard we're conforming to, and as the user I don't really feel it as a pain point tbh

aimane-chnaif commented 1 year ago

I agree that there's no unified standard on email validation in chats. But there should be stand validation when login or add new contact. Right now it's inconsistent between frontend and backend. Those kinds of long emails are acceptable in frontend but not in backend. This will break flow when add contact in offline mode.

MelvinBot commented 1 year ago

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

jjcoffee commented 1 year ago

There's also inconsistency between different parts of the frontend, e.g. searching for a user when starting a new chat uses a different validation from the markdown one. I think it makes sense to maximise consistency where possible.

NikkiWines commented 1 year ago

Right now it's inconsistent between frontend and backend. Those kinds of long emails are acceptable in frontend but not in backend. This will break flow when add contact in offline mode.

searching for a user when starting a new chat uses a different validation from the markdown one. I think it makes sense to maximise consistency where possible.

Yeah, good points @aimane-chnaif and @jjcoffee, these are both compelling reasons to modify our current front-end validation logic. I agree it would be good to be consistent.

Aside from the PHP validation mentioned earlier, we also use the following regex to validate emails and phone numbers [\w\.'#%+-]+@[a-zA-Z\d\.-]+\.[a-zA-Z]{2,}|^\+?[1-9]\d{1,14}$)

MelvinBot commented 1 year ago

@NikkiWines @adelekennedy @aimane-chnaif this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

MelvinBot commented 1 year ago

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

aimane-chnaif commented 1 year ago

@NikkiWines how do we proceed here? If we landed on making all email validations consistent on this issue, can you please update issue title and Expected Result? https://github.com/Expensify/App/issues/17839 was closed in favor of this issue.

NikkiWines commented 1 year ago

I actually think we should probably make a new GitHub issue for this since the current issue has changed significantly from the OP. It will also make it easier for us to track who should receive what payment and account for the timeline of the bug and proposal reviewing process.

Thoughts @adelekennedy? If we're agreed, this should be raised in the #expensify-bugs slack channel as a new BUG: post with proper reproduction steps

chiragxarora commented 1 year ago

Shouldn't the original bug reporter be the same for the bug report since everything revolves around the main invalid email validation issue?

alexrasla commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

In this problem, we are able to add and initiate a chat with email ids that are invalid.

What is the root cause of that problem?

The root cause of the problem is the fact that the EMAIL_BASE_REGEX regular expression constant in node_modules/expensify-common/lib/CONST.jsx does not check for character limits for email ids and domains. The constant only checks that the compared string has valid email characters and an @ sign through regular expressions.

What changes do you think we should make in order to solve the problem?

The change I would make to solve this is simply changing the EMAIL_BASE_REGEX to check if the email id has the correct type and number of characters in the regular expression. This would produce the Invalid email error on any page as shown below so the user would never be able to add the email address.

Screenshot 2023-05-04 at 9 24 39 AM

What alternative solutions did you explore? (Optional)

Reminder: Please use plain English, be brief and avoid jargon. Feel free to use images, charts or pseudo-code if necessary. Do not post large multi-line diffs or write walls of text. Do not create PRs unless you have been hired for this job.

MelvinBot commented 1 year ago

@NikkiWines @adelekennedy @aimane-chnaif this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!

NikkiWines commented 1 year ago

Chatted with some other internal employees and decided to just update this issue + double the price to try and get things moving. cc: @aimane-chnaif

melvin-bot[bot] commented 1 year ago

Upwork job price has been updated to $2000

jjcoffee commented 1 year ago

@aimane-chnaif I've updated my proposal with a more fully fledged regex to match the BE.

I think it might be good if we could agree on a set of test emails to run through (I have some on my proposal, but open to adding more!).

melvin-bot[bot] commented 1 year ago

@NikkiWines, @adelekennedy, @aimane-chnaif Whoops! This issue is 2 days overdue. Let's get this updated quick!

NikkiWines commented 1 year ago

@aimane-chnaif, please review the updated proposals above, thank you! 🙇

melvin-bot[bot] commented 1 year ago

@NikkiWines @adelekennedy @aimane-chnaif this issue is now 4 weeks old and preventing us from maintaining WAQ, can you:

Thanks!

melvin-bot[bot] commented 1 year ago

Current assignee @aimane-chnaif is eligible for the Internal assigner, not assigning anyone new.

melvin-bot[bot] commented 1 year ago

@NikkiWines, @adelekennedy, @aimane-chnaif Whoops! This issue is 2 days overdue. Let's get this updated quick!

aimane-chnaif commented 1 year ago

@jjcoffee can you make sure that your solution matches PHP email validation used in backend?

aimane-chnaif commented 1 year ago

Backend email validation uses PHP's email validation filter FILTER_VALIDATE_EMAIL as well as the following regex (for emails and phone numbers) [\w.'#%+-]+@[a-zA-Z\d.-]+.[a-zA-Z]{2,}|^+?[1-9]\d{1,14}$)

@NikkiWines one more thing to confirm

I see there are still 2 different validations in backend. Can you please specify use cases for each one so that frontend does the same?

kushu7 commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

Inconsistent front-end email validation across the product

What is the root cause of that problem?

We don't have matching validation as per backend.

What changes do you think we should make in order to solve the problem?

We can use this regex in front-end. it is compatible with backend php FILTER_VALIDATE_EMAIL.

/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-zA-Z0-9]+(?:-+[a-zA-Z0-9]+)*\.){1,126}){1,}(?:(?:[a-zA-Z][a-zA-Z0-9]*)|(?:(?:xn--)[a-zA-Z0-9]+))(?:-+[a-zA-Z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){7})|(?:(?!(?:.*[a-fA-F0-9][:\]]){7,})(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){0,5})?::(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){5}:)|(?:(?!(?:.*[a-fA-F0-9]:){5,})(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){0,3})?::(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/gm

Result:

https://github.com/Expensify/App/assets/31707153/b076bdad-dbe2-4e6b-9a3a-9715f661adb1

What alternative solutions did you explore? (Optional)

None

jjcoffee commented 1 year ago

@aimane-chnaif I've made sure that the validation matches the BE, see the list of test cases in my proposal, which have tested successfully against the existing BE validation.

The additional BE regex (from here) is more restrictive on which characters it accepts than the PHP FILTER_VALIDATE_EMAIL filter (e.g. it won't accept email{@domain.com, whereas PHP would), so we only really need the length limits from the PHP filter and then we can just use the BE regex for character validation. I have confirmed from testing that the additional BE regex is just applied on top of the PHP filter, so it should be perfectly aligned now.

jjcoffee commented 1 year ago

@aimane-chnaif I've popped a test script up here if you want to double-confirm the functioning of the additional regex on top of the PHP filter.

NikkiWines commented 1 year ago

I see there are still 2 different validations in backend. Can you please specify use cases for each one so that frontend does the same?

We use both validations in the back end for processing any/all emails/phone numbers, there aren't specific use cases for one vs the other. I think the best approach would be to try and make a combination of both validations for the front-end validation.

adelekennedy commented 1 year ago

catching up here, are we looking for an amended proposal? Should this issue be external or remain internal?

aimane-chnaif commented 1 year ago

@adelekennedy external yet. Still validating proposals

adelekennedy commented 1 year ago

should we update this to add the external label @aimane-chnaif or do you want to review the existing proposals and then add the external label if needed?

aimane-chnaif commented 1 year ago

should we update this to add the external label?

yes please, we might get better solutions.