Closed lanitochka17 closed 3 months ago
Job added to Upwork: https://www.upwork.com/jobs/~01c622c8bf192a870b
Triggered auto assignment to Contributor-plus team member for initial proposal review - @aimane-chnaif (External
)
Triggered auto assignment to @mallenexpensify (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details.
We think that this bug might be related to #vip-vsp CC @quinthar
When user paste the text There is change method getting invoked further which has series of actions internally and finally lead to no class found exception. Let me know if this is okay to go ahead CC: @mallenexpensify @lanitochka17
📣 @nrajyaguru! 📣 Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:
Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>
Hey @lanitochka17, Is this issue specifically related to expensifail user? If so how to create one? I've tested locally with normal user and I'm unable to reproduce.
📣 @nayabatir1! 📣 Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:
Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>
Unable to reproduce on iOS, @aimane-chnaif can you test on android? Thx
reproduced on android
On 5th repeat, it took 10s to be pasted into composer. Seems related to live markdown
After copy pasting email several times, composer is super slow when try to edit message and almost unusable. Huge performance lag
Checking on in the markdown room https://expensify.slack.com/archives/C06BDSWLDPB/p1707866403183029 Thanks Aimane.
Thanks for reporting this issue, I will investigate it, please assign me.
@thienlnam Could we also add this issue in #36071?
📣 @aimane-chnaif 🎉 An offer has been automatically sent to your Upwork account for the Reviewer role 🎉 Thanks for contributing to the Expensify app!
@tomekzaw, @mallenexpensify, @thienlnam, @aimane-chnaif Whoops! This issue is 2 days overdue. Let's get this updated quick!
Working on it
@tomekzaw Do you have an ETA for this specific bug?
@thienlnam Here's what I got so far – parsing Markdown for the provided input takes ~180 ms (way too much):
I've managed to reproduce this issue in Live Markdown example app on Android. However, I suspect that the root cause might be located in ExpensiMark. At this point I'm not entirely sure which phase exactly takes so long but I will do my best to narrow it down. Will keep you updated
edit: Found input a@a.aaaaaaaaaaaaaaaaa.a
which takes a few seconds to process, will investigate this further
Thanks for the update, it's definitely possible our email regex is pretty inefficient at parsing emails
it's definitely possible our email regex is pretty inefficient at parsing emails
@thienlnam I'm afraid you're right, ExpensiMark is generally slow but it turns out that it's extremely slow for certain emails like a@a.aaaaaaaaaaaaaa.a
(up to 600 ms 🤯 ).
This only happens on Hermes though; on JSC (which we use for Live Markdown Preview on iOS) it runs without any lags. Also, first I tried to measure execution time of ExpensiMark on Node.js and it also worked quite fast (10-20 ms).
So the root cause of the issue is that the regular expressions which we use for email rule in ExpensiMark are just slow on Hermes. I'm afraid we can't do much in terms of Hermes performance so the viable alternative would be to somehow tweak the regular expressions if possible. I believe this resource will be helpful: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
Also, this might be the reason why it takes some time to open the screen with chat history – we need to parse Markdown for all messages in the viewport (up to 10 ms per one message) which introduces the delay:
https://github.com/Expensify/App/assets/20516055/7a7e264b-44a5-4743-9fce-4fb6df4c9692
As for measuring execution time of ExpensiMark for certain inputs, here's how I did it – I added the following code snippet in src/components/Composer/index.native.tsx
:
useEffect(() => {
for (let i = 0; i < 10; i++) {
const input = 'a@a.aaaaaaaaaaaaaa.a';
const start = performance.now();
const parser = new ExpensiMark();
const output = parser.replace(input);
const end = performance.now();
console.log({
input,
output,
time: `${end - start} ms`,
});
}
});
Input: Hello world
Time: a few ms
LOG {"input": "Hello world", "output": "Hello world", "time": "9.289958000183105 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "5.467625007033348 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "6.666542008519173 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "9.126292005181313 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "8.8507080078125 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "6.105708003044128 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "8.753625005483627 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "7.322457998991013 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "8.580208986997604 ms"}
LOG {"input": "Hello world", "output": "Hello world", "time": "9.936958000063896 ms"}
Input: Hello *world*
Time: a few ms
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "10.74737499654293 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "7.896125003695488 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "5.966250002384186 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "6.308209002017975 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "5.45533299446106 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "6.105708003044128 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "5.429584011435509 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "4.145833998918533 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "5.39316600561142 ms"}
LOG {"input": "Hello *world*", "output": "Hello <strong>world</strong>", "time": "4.395540997385979 ms"}
Input: someone@example.com
Time: a few ms
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "12.564208999276161 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "8.412416994571686 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "6.9930839985609055 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "9.297625005245209 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "6.857374995946884 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "6.5077079981565475 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "6.665208011865616 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "5.774875000119209 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "8.12295900285244 ms"}
LOG {"input": "someone@example.com", "output": "<a href=\"mailto:someone@example.com\">someone@example.com</a>", "time": "6.6164159923791885 ms"}
Input: a@a.aaaaaaaaaaaaaa.a
Time: 400-600 ms ❗
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "488.0825420022011 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "504.0287500023842 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "624.1307080090046 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "571.2800410091877 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "602.4013760089874 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "666.4960010051727 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "599.9842499941587 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "706.1396670043468 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "421.40104199945927 ms"}
LOG {"input": "a@a.aaaaaaaaaaaaaa.a", "output": "<a href=\"mailto:a@a.aaaaaaaaaaaaaa\">a@a.aaaaaaaaaaaaaa</a>.a", "time": "357.1014170050621 ms"}
In order for Live Markdown Preview to work smoothly when typing, theoretically the parsing time cannot exceed 16 ms (or 8 ms for 120 fps), assuming there are no other tasks for the UI thread, but realistically this would need to be < 1 ms.
Also cc-ing @roryabraham
Thanks for the investigation - that is unfortunate. So it seems we have a few options
Update the regex for email parsing so it's not "evil regex"
Go with a simple regex so that we only do simple validation on the FE, and then more complex validation on the BE where this won't matter
Help update Hermes to be more performance for regex (probably the hardest heh)
Sounds really good. As for option #1, I've just found this online tool for checking if RegExp is vulnerable to ReDoS: https://devina.io/redos-checker
Here's the RegExp for email: https://github.com/Expensify/expensify-common/blob/e4c35b74c800edbe6eba8b7b0679e9403605a7ac/lib/CONST.jsx#L3
Once we refactor the RegExp, we can use this tool to ensure that it's no longer vulnerable.
I found this regex on yup, it is also ReDos safe.
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
@tomekzaw @mallenexpensify @thienlnam @aimane-chnaif this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!
We're making good progress - we're going to be updating the regex for validating emails in a way that is not susceptible to reDos. We'll probably also want to validate that it still has the same amount of test coverage for common / invalid emails
@thienlnam Since we already have a clear plan on how to tackle this performance problem, can we assign Kuba from SWM to take care of this issue? This way we can parallelize work on this so I can focus on next issues.
Yeah we can split this one out so you can focus on different issues - we can give this to Kuba or just make it an external issue
Pasting emails with certain formats like a@a.aaaaaaaaaaaaa.a takes much longer than expected. Doing it multiple times leads to "App not responding" state.
Regex used for emails in ExpensiMark is slow on Hermes, as investigated by @tomekzaw.
We will need to change the below in expensify-common: https://github.com/Expensify/expensify-common/blob/6fa01a684220346eb75452cc12b23865feefd606/lib/CONST.d.ts#L250-L253
/**
* Regex matching an text containing an email
*/
readonly EMAIL_PART: "([\\w\\-\\+\\'#]+(?:\\.[\\w\\-\\'\\+]+)*@(?:[\\w\\-]+\\.)+[a-z]{2,})";
@tomekzaw @thienlnam Can you have a look at my proposal?
We want to update the regex so that it's ReDos safe - but don't want to compromise too much on our existing validation.
To test how this compares, I'm going to add a few examples so we can test it's performance - in your proposals please note how it compares to the existing validation
VALID
simple@example.com
very.common@example.com
disposable.style.email.with+symbol@example.com
other.email-with-dash@example.com
fully-qualified-domain@example.com
user.name+tag+sorting@example.com
x@example.com
example-indeed@strange-example.com
test.email+alex@leetcode.com
example@s.solutions
" "@example.org
"john..doe"@example.org
mailhost!username@example.org
user%example.com@example.org
user-@example.org
International
用户@例子.广告
उपयोगकर्ता@उदाहरण.कॉम
чебурашка@ящик-с-апельсинами.рф
θσερ@εχαμπλε.ψομ
Dörte@Sörensen.example.com
Invalid
Abc.example.com
A@b@c@example.com
a"b(c)d,e:f;g<h>i[j\k]l@example.com
just"not"right@example.com
this is"not\allowed@example.com
i_like_underscore@but_its_not_allowed_in_this_part.example.com
QA[icon]CHOCOLATE[icon]@test.com
john..doe@example.com
.john.doe@example.com
john.doe.@example.com
john..doe@example.com
@thienlnam Can we assign @SzymczakJ? I've asked him to help me out with performance-related efforts in Live Markdown starting from this issue
Sure, @SzymczakJ could you please comment on this issue so I can assign you?
Hey! I’m Jakub Szymczak from Software Mansion, an expert agency, and I’d like to work on this issue! Sorry, already started to work on this but forgot to comment 😅
@tomekzaw @mallenexpensify @thienlnam @SzymczakJ @aimane-chnaif this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!
Working at this issue. Update for now: I was trying to come up with/find regex that's ReDOS safe but failed to do so. Every ReDOS safe regex that I find, even this proposed here in this issue is not matching our use case (not able to match both emails in "example@gmail.com some random text example2@gmail.com" string). When I change it to match our use case then the regex becomes ReDOS vulnerable. It's probable that such ReDOS safe regex is not possible in our case, then maybe we should look for performance improvements in other places?
Still working at it. Current status: trying to profile Hermes runtime to find out which part of ExpensiMark parser is working too slow. With that knowledge maybe we will be able to make it faster.
Current update: investigated performance of ExpensiMark on Hermes. The input for profiling ExpensiMark was "applausetester++0411km@applause.expensifail.comapplausetester++0411km@applause.expensifail.comapplausetester++0411km@applause.expensifail.comapplausetester++0411km@applause.expensifail.com"
which is basically and email from this issue reproduction multiplied 4 times.
As we can see execution of regExpPrototypeExec
takes more than 2 seconds, so we can be sure that the RegExp implementation for Hermes is the problem here. Digging deeper into Hermes internals to further investigate the problem.
@SzymczakJ is the bug reproducible with other text or data being pasted repeatedly in the compose box? I want to add this to a VIP/wave project and I'm thinking #vip-vsb. Wanting to know if small business users will be affected by this or if it's an edge case that only happens with pasted emails.
@mallenexpensify I didn't manage to reproduce it with anything other than emails.
I've also measured time of Expensimark parsing which is a bottleneck in this issue, these are result for pretty small inputs:
something.someone@somedomain.com
- 4ms
something.someone@somedomain.comsomething.someone@somedomain.com
- 5595ms!
testEmail@testDomain.com
- 1ms
testEmail@testDomain.comtestEmail@testDomain.com
- 7ms
testEmail.testEmail@testDomain.com
- 4ms
testEmail.testEmail@testDomain.comtestEmail.testEmail@testDomain.com
-5405ms!
Keep in mind that we want to keep parse time at the maximum of 3 ms for the experience to be smooth, so even parsing single mail like something.someone@somedomain.com
takes too long.
Thanks @SzymczakJ , appreciate the context, I had no idea that it took longer to parse emails with more characters.
@thienlnam , lil help here. with this only affecting users who paste a bunch of emails in at once, do you think it's worth fixing right now or should we close then maybe revisit later?
One reason TO potentially fix now would be if email parsing updates also benefited or sped up other places in the app. (I have no clue)
@mallenexpensify after spending some time on this issue these are my thoughts:
Maybe we should think of some more general and long-term solution for this "live-markdown working slow with some inputs" issue. Do you have other ideas? cc @roryabraham @thienlnam
@tomekzaw @mallenexpensify @thienlnam @SzymczakJ @aimane-chnaif this issue is now 4 weeks old and preventing us from maintaining WAQ. This should now be your highest priority. Please post below what your plan is to get a PR in review ASAP. Thanks!
Current assignee @aimane-chnaif is eligible for the Internal assigner, not assigning anyone new.
Yeah I don't think this is an urgent bug we need to fix - it's definitely a malicious edge case so we can add it to the list but can prioritize other issues before this
I added this to the main tracking issue https://github.com/Expensify/App/issues/36071, and adjusted priority
Added to #vip-vsb because the bug doesn't have to do with money. I wonder, since there's also the tracking issue, if there's another/better way to manage this ¯_(ツ)_/¯
It looks like Jack and gang are going through issues, from high > low and this one is medium.
Closing this! We've worked through all the critical issues, I think every issue from here on out can just be addressed independently. We're still working through https://github.com/Expensify/App/issues/39518 but there's no longer a need for this main tracking issue
So.. now... do we want to do anything here? Fix it? add retest-weekly
label? Start a discussion in the #react-native-live-markdown?
From @SzymczakJ above
after spending some time on this issue these are my thoughts:
- it's not worth to spend so much time on such edge case that doesn't happen when using app without trying to purposefully break it
- even if we fix this edge case there will be other 'attack strings' that also cause problems for our regexes and more such problems may arise
Maybe we should think of some more general and long-term solution for this "live-markdown working slow with some inputs" issue. Do you have other ideas?
I think we can close for now and re-open if this comes up again. Realistically this doesn't seem like a common use-case and the time tradeoff does not seem worth it
If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Version Number: 1.4.90 Reproducible in staging?: Y Reproducible in production?: Y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Expensify/Expensify Issue URL: Issue reported by: Applause - Internal Team Slack conversation:
Action Performed:
Expected Result:
From header copy email and paste in compose repeatedly, app isn't responding message must note be displayed
Actual Result:
From header copy email and paste in compose repeatedly, app isn't responding message is displayed
Workaround:
Unknown
Platforms:
Which of our officially supported platforms is this issue occurring on?
Screenshots/Videos
Add any screenshot/video evidence
https://github.com/Expensify/App/assets/78819774/aaaa0165-78c3-4a54-88f3-f5bc7c462b33
logs (2).txt
View all open jobs on GitHub
Upwork Automation - Do Not Edit