Closed kavimuru closed 8 months ago
Triggered auto assignment to @adelekennedy (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details.
Platforms
in OP are ✅)@adelekennedy Whoops! This issue is 2 days overdue. Let's get this updated quick!
I can reproduce! Seems very similar to #17388
Job added to Upwork: https://www.upwork.com/jobs/~01878e05ad15a14027
Current assignee @adelekennedy is eligible for the External assigner, not assigning anyone new.
Triggered auto assignment to Contributor-plus team member for initial proposal review - @aimane-chnaif (External
)
Triggered auto assignment to @NikkiWines (External
), see https://stackoverflow.com/c/expensify/questions/7972 for more details.
Invalid email validation in chats, leading to the wrong strings markdowns being transformed to email anchor tags by using mailto:
protocol.
After the transformation, when we click the link rendered in chats, it also opens that invalid email in external email app.
Root cause of the problem is wrong regex applied in expensify's helper library called Expensify-common. Here the regex expressions are defined in the file lib/CONST.jsx/
as
Since a valid email can have at most 254 characters, this regex failed to put a limit on length of total string which is causing the problem
We should add a clause to the regex above which can limit the total length of valid email.
We can do this by adding this additional condition (?=.{1,254}$)
After adding the length condition:
Before that(in production right now):
Thankyou!
EDIT: grammar PS: I'm sorry if something's off in the detailing of proposal, this is my first proposal to Expensify
We want email validation to be consistent across the app and between FE and BE.
We are using two different sets of regex patterns on the FE, neither of which fully matches the behaviour of the BE. The one that is used when validating emails when starting a new chat can be traced back to EMAIL_BASE_REGEX
:
and then there's the markdown one here, which eventually is used to converts emails to links in-chat:
This duplicating of regexes appears to be a result of this PR.
I think it makes sense to have one master rule which fully matches BE behaviour, which would be used both for starting chats and for figuring out when a string in a message needs to be converted to an email link. This just keeps things simple going forwards, but I can see there could be an argument for looser constraints for the chat markdown conversion (as it's less critical if it gets it wrong).
The BE uses a combination of FILTER_VALIDATE_EMAIL
and additional regex ([\w\.'#%+-]+@[a-zA-Z\d\.-]+\.[a-zA-Z]{2,}
) that is more restrictive than the PHP filter in terms of which characters it accepts.
If we look at the regex of FILTER_VALIDATE_EMAIL
extracted from PHP source, we can see that it does a mixture of character validation and various lengths. To summarise it will validate:
.
)@example-.com
or @-example.com
)We don't need to care about the details of the filter's character validation as the additional regex the BE uses is more restrictive (e.g. BE won't accept "{" as a valid character, whereas the PHP filter would), so we just want to apply the BE's additional regex when it comes to which characters we accept.
In order to match all the above behaviour, I think we should update EMAIL_BASE_REGEX
to this:
New regex
(?=((?=[\\w'#%+-]+(?:\\.[\\w'#%+-]+)*@)[\\w\\.'#%+-]{1,64}@(?:(?=[a-z\\d]+(?:-+[a-z\\d]+)*\\.)(?:[a-zA-Z\\d-]{1,63}\\.)+[a-zA-Z]{2,63})(?= |_|\\b))(?<end>.*))\\S{3,254}(?=\\k<end>$)
This combines the length limits in FILTER_VALIDATE_EMAIL
and the character limits in the extra BE regex (as well as the TLD being >2 length), so this will fully match the BE's behaviour as described above.
The MARKDOWN_EMAIL
constant should be updated to refer to the base regex: MARKDOWN_EMAIL: EMAIL_BASE_REGEX
(or it can be removed and references updated to EMAIL_BASE_REGEX
).
The autoEmail
rule needs to be updated to this:
(?![^<]*>|[^<>]*<\\/)(?:(?<= |[^\\w'#%+-])|^)${CONST.REG_EXP.MARKDOWN_EMAIL}(?![^<]*(<\\/pre>|<\\/code>|<\\/a>))
This ensures that the length count starts at either whitespace, start of line or non-email character. Without this you'd get the part of the email within the length limit converted and the rest left as plain text.
The BE considers _example@test.com
a valid email which conflicts a bit with markdown italic underscores, so dealing with that requires an extra processing step. See below for the fix.
All the following test suite emails can be validated against the PHP filter + BE regex in this quick PHP script I threw together.
There are several tests that need to be removed as they test for emails that the BE considers invalid. The following email patterns need to be removed from this test:
There are also several tests that include the following invalid emails in the same file:
We would also need to modify the result of this test as _abc@gmail.com
would be considered a valid email by BE whereas def@gmail.com_
is not so the correct result is:
result = '<a href="mailto:_abc@gmail.com">text</a><br />'
+ '[text](<a href="mailto:def@gmail.com">def@gmail.com</a>_)';
This test also needs to be updated toBeFalsy as $test@gmail.com
is an invalid email according to the BE.
Several new tests need to be added to Str-test.js. These can replace Str.isValidEmailMarkdown
tests as these two functions run exactly the same regex now. Proposed tests are below. They can be validated against PHP + the BE's regex in this PHP script.
We can also add similar tests to ExpensiMark-HTML-test.js, as well as a new test for the more complex italic markdown underscore case mentioned above:
test@example.com' + '
test@example.com' + '
test@example.com
Hi @jjcoffee , You're right about the duplicate regex rules and those two can be combined into one which is being used in the backend (if it is being used). I believe probability of a user coming in and typing a very long email like string is not much but having a limit on length makes our app more consistent with any modern day chat application in the market and also RFC compliant at the same time. Again possibility of someone entering a very long character email like string is low but if we render it as a link which opens in an email app, that would reflect as overall bad aesthetics and validation. Since we cannot accurately validate the TLDs without the DNS lookup, the least we can do is validation on length part on frontend.
I agree that it's inconsistency issue to validate emails differently in various parts (login, chat, add contact, search, etc) with different regexes. I think we should also match validations between frontend and backend. @NikkiWines can you please share all email validation regexes used in backend along with where they're used?
I am coming from the PR which introduced MARKDOWN_EMAIL regex.
I am not sure if understand the problem statement correctly or what is trying to be fixed here.
1) The problem statement expects "should not parse the email address as a valid one for abc@abc.abc". 2) And the proposal and slack conversation is talking about limiting email length to RFC standard. 3) And we are also checking to make common regex for emails(this was also talked in the original PR which introduced MARKDOWN_EMAIL).
I am confused here, Are we trying to solve all 3 above or something specifically? Could you kindly provide this information so I can provide better information with regards to the original MARKDOWN_EMAIL regex. @aimane-chnaif
@abdulrahuman5196 abc@abc.abc is valid email. What the issue mentioned is very long email:
@chiragxarora as you're a reporter, can you share example invalid email here so that other can copy&paste and reproduce easily?
Sure @aimane-chnaif
Jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjeJajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjeJajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd
Jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjeJajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjeJajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd
ok, so I tried with this email on various parts - login, search, new contact all are valid in frontend but all are invalid in backend. So it makes sense to fix this inconsistency here. cc: @NikkiWines (waiting for result on https://github.com/Expensify/App/issues/17387#issuecomment-1517894210)
Sorry for the delay here! Couple things looking through this issue.
Enter any invalid email having atleast 3 parts abc@abc.abc(can extend from here)
.abc
(and .om
from the testing screenshots) are both valid TLDs, so some of the emails in the OP that we're saying are invalid are actually valid. .ddddd
and .cdjd
don't seem to be valid though.
FWIW I don't know how useful it is to impose an email length limit here as we're not really in the business of fully validating the email (for example we still accept incorrect TLDs, like .example), we just want to turn any reasonably email-looking string into a link. If someone wants to deliberately type an incredibly long email, they're free to do it - as long as we're not expecting it to validate as a real email on the BE. (The same is true of the https://github.com/Expensify/App/issues/17388 that was closed).
Yep, I agree with all of this and can confirm it's not successfully passing on the BE
@NikkiWines can you please share all email validation regexes used in backend along with where they're used?
From some local testing I can tell you that it's failing PHP's email validation filter FILTER_VALIDATE_EMAIL
. Also I don't think you need the usage, as the parsing for new Expensify should be handled in Expensimark.js.
Hi @NikkiWines correct me if I'm wrong but I think validation on domain names including TLD's is not really possible since any organization can register for their custom domains and they get added constantly. What we were trying to do was adding a validation check on length of the email address because that's what possible on frontend. To accurately validate a domain name I don't think we have any option other than dns lookup..
correct me if I'm wrong but I think validation on domain names including TLD's is not really possible since any organization can register for their custom domains and they get added constantly.
No, you're correct, and I'm not suggesting we move to validating emails based on TLDs - just mentioning that the emails used in the OP aren't actually invalid (at the very least, the abc@abc.abc
shouldn't be)
In this case, it doesn't seem that there's a unified standard for handle these extra long emails, I think our current formatting (which seems to be the same as Githubs right now) is fine. @adelekennedy, @aimane-chnaif thoughts?
Here are some other examples:
Gmail
Slack
Github
jaj@asjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdi.cdjd
abc@abc.abc
jajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfisjksksjsjssskssjskskssksksksksskdkddkddkdksskskdkdkdksskskskdkdkdkdkekeekdkddenejeodxkdndekkdjddkeemdjxkdenendkdjddekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejdicdjejajasjjssjdjdjdjdjdjjeiwiwiwowkdjdjdieikdjfidekjcjdkekejdcjdkeekcjcdidjjcdkekdiccjdkejdjcjxisdjjdkedncicdjejejcckdsijcjdsodjcicdkejd@gmail.com
@NikkiWines that seems fine to me - there doesn't really seem to be a standard we're conforming to, and as the user I don't really feel it as a pain point tbh
I agree that there's no unified standard on email validation in chats. But there should be stand validation when login or add new contact. Right now it's inconsistent between frontend and backend. Those kinds of long emails are acceptable in frontend but not in backend. This will break flow when add contact in offline mode.
📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸
There's also inconsistency between different parts of the frontend, e.g. searching for a user when starting a new chat uses a different validation from the markdown one. I think it makes sense to maximise consistency where possible.
Right now it's inconsistent between frontend and backend. Those kinds of long emails are acceptable in frontend but not in backend. This will break flow when add contact in offline mode.
searching for a user when starting a new chat uses a different validation from the markdown one. I think it makes sense to maximise consistency where possible.
Yeah, good points @aimane-chnaif and @jjcoffee, these are both compelling reasons to modify our current front-end validation logic. I agree it would be good to be consistent.
Aside from the PHP validation mentioned earlier, we also use the following regex to validate emails and phone numbers [\w\.'#%+-]+@[a-zA-Z\d\.-]+\.[a-zA-Z]{2,}|^\+?[1-9]\d{1,14}$)
@NikkiWines @adelekennedy @aimane-chnaif this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!
📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸
@NikkiWines how do we proceed here? If we landed on making all email validations consistent on this issue, can you please update issue title and Expected Result? https://github.com/Expensify/App/issues/17839 was closed in favor of this issue.
I actually think we should probably make a new GitHub issue for this since the current issue has changed significantly from the OP. It will also make it easier for us to track who should receive what payment and account for the timeline of the bug and proposal reviewing process.
Thoughts @adelekennedy? If we're agreed, this should be raised in the #expensify-bugs slack channel as a new BUG:
post with proper reproduction steps
Shouldn't the original bug reporter be the same for the bug report since everything revolves around the main invalid email validation issue?
In this problem, we are able to add and initiate a chat with email ids that are invalid.
The root cause of the problem is the fact that the EMAIL_BASE_REGEX
regular expression constant in node_modules/expensify-common/lib/CONST.jsx
does not check for character limits for email ids and domains. The constant only checks that the compared string has valid email characters and an @ sign through regular expressions.
The change I would make to solve this is simply changing the EMAIL_BASE_REGEX
to check if the email id has the correct type and number of characters in the regular expression. This would produce the Invalid email
error on any page as shown below so the user would never be able to add the email address.
Reminder: Please use plain English, be brief and avoid jargon. Feel free to use images, charts or pseudo-code if necessary. Do not post large multi-line diffs or write walls of text. Do not create PRs unless you have been hired for this job.
@NikkiWines @adelekennedy @aimane-chnaif this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!
Chatted with some other internal employees and decided to just update this issue + double the price to try and get things moving. cc: @aimane-chnaif
Upwork job price has been updated to $2000
@aimane-chnaif I've updated my proposal with a more fully fledged regex to match the BE.
I think it might be good if we could agree on a set of test emails to run through (I have some on my proposal, but open to adding more!).
@NikkiWines, @adelekennedy, @aimane-chnaif Whoops! This issue is 2 days overdue. Let's get this updated quick!
@aimane-chnaif, please review the updated proposals above, thank you! 🙇
@NikkiWines @adelekennedy @aimane-chnaif this issue is now 4 weeks old and preventing us from maintaining WAQ, can you:
Thanks!
Current assignee @aimane-chnaif is eligible for the Internal assigner, not assigning anyone new.
@NikkiWines, @adelekennedy, @aimane-chnaif Whoops! This issue is 2 days overdue. Let's get this updated quick!
@jjcoffee can you make sure that your solution matches PHP email validation used in backend?
Backend email validation uses PHP's email validation filter FILTER_VALIDATE_EMAIL as well as the following regex (for emails and phone numbers) [\w.'#%+-]+@[a-zA-Z\d.-]+.[a-zA-Z]{2,}|^+?[1-9]\d{1,14}$)
@NikkiWines one more thing to confirm
I see there are still 2 different validations in backend. Can you please specify use cases for each one so that frontend does the same?
Inconsistent front-end email validation across the product
We don't have matching validation as per backend.
We can use this regex in front-end. it is compatible with backend php FILTER_VALIDATE_EMAIL
.
/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-zA-Z0-9]+(?:-+[a-zA-Z0-9]+)*\.){1,126}){1,}(?:(?:[a-zA-Z][a-zA-Z0-9]*)|(?:(?:xn--)[a-zA-Z0-9]+))(?:-+[a-zA-Z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){7})|(?:(?!(?:.*[a-fA-F0-9][:\]]){7,})(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){0,5})?::(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){5}:)|(?:(?!(?:.*[a-fA-F0-9]:){5,})(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){0,3})?::(?:[a-fA-F0-9]{1,4}(?::[a-fA-F0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/gm
Result:
https://github.com/Expensify/App/assets/31707153/b076bdad-dbe2-4e6b-9a3a-9715f661adb1
None
@aimane-chnaif I've made sure that the validation matches the BE, see the list of test cases in my proposal, which have tested successfully against the existing BE validation.
The additional BE regex (from here) is more restrictive on which characters it accepts than the PHP FILTER_VALIDATE_EMAIL
filter (e.g. it won't accept email{@domain.com
, whereas PHP would), so we only really need the length limits from the PHP filter and then we can just use the BE regex for character validation. I have confirmed from testing that the additional BE regex is just applied on top of the PHP filter, so it should be perfectly aligned now.
@aimane-chnaif I've popped a test script up here if you want to double-confirm the functioning of the additional regex on top of the PHP filter.
I see there are still 2 different validations in backend. Can you please specify use cases for each one so that frontend does the same?
We use both validations in the back end for processing any/all emails/phone numbers, there aren't specific use cases for one vs the other. I think the best approach would be to try and make a combination of both validations for the front-end validation.
catching up here, are we looking for an amended proposal? Should this issue be external or remain internal?
@adelekennedy external yet. Still validating proposals
should we update this to add the external label @aimane-chnaif or do you want to review the existing proposals and then add the external label if needed?
should we update this to add the external label?
yes please, we might get better solutions.
If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Action Performed:
(Chat)
(Login/SignUp)
(Add Contact Method)
Expected result:
Front-end validation should prevent these emails from being added and creating unnecessary network requests. Additionally, consistent validation should be used in all areas where we validate emails in the product (e.g. pasting an invalid email into an existing chat still formats it like a valid email)
Backend email validation uses PHP's email validation filter FILTER_VALIDATE_EMAIL as well as the following regex (for emails and phone numbers) [\w.'#%+-]+@[a-zA-Z\d.-]+.[a-zA-Z]{2,}|^+?[1-9]\d{1,14}$)
Actual results:
Emails are successfully submitted, only to be subsequently rejected by the backend
Workaround:
N/A
Platforms:
Which of our officially supported platforms is this issue occurring on?
Version Number: v1.3.0-0 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation
https://user-images.githubusercontent.com/43996225/231782900-50ff85fb-d63d-481a-b754-6265979f3f68.mp4
Expensify/Expensify Issue URL: Issue reported by: @chiragxarora Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1681358100970919
View all open jobs on GitHub
Upwork Automation - Do Not Edit