Closed kavimuru closed 7 months ago
Job added to Upwork: https://www.upwork.com/jobs/~013b44c7d5e9e29617
Triggered auto assignment to @garrettmknight (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details.
Triggered auto assignment to Contributor-plus team member for initial proposal review - @ntdiary (External
)
Waiting on proposals
The app lags when code is entered in description textbox
From what I have investigated so far, I think default value in the component is parsing the text entered into the textbox. When we enter the code above, I think the parser takes longer time to render which causes the lag. https://github.com/Expensify/App/blob/8e2fb675389bb84deee4b8e001a536252883e72f/src/pages/tasks/NewTaskDetailsPage.js#L116
I think we should remove parser from default value attribute in text input and move it to where "task description" state default value is set.
NA
Adding code block in task description makes the page laggy.
PR #29144 modified the task description input logic to support mark down. Task description is parsed to mark down before saving and vice versa when setting the default value to input.
However the parsing logic which uses the method parser.replace()
is computationally intensive.
CPU Profiling above shows modifyTextForUrlLinks
, which gets invoked by parser.replace
taking around 6 seconds to execute when transitioning from New Task to Confirm task page.
There are 3 locations in Task creation where parser.replace
is used.
Note that defaultValue
prop is directly assigned a function call instead of a memoized value. This causes the computationally intensive method to be invoked whenever the page state changes. This in turn causes the JS bridge to get blocked and animations to lag.
Also the useEffect here is dependent on props.task
.
When navigating from NewTaskDetailPage to NewTaskPage(ConfirmTask) screen , the first screen is not unmounted by ReactNavigation. It keeps the page in memory for navigating back. This causes the useEffect to get executed when task object is updated in Onyx. Which also invokes the parser.replace method.
So overall a computationally intensive method gets unnecessarily executed multiple times causing the app to lag.
Update URL_WEBSITE_REGEX
in expensify-commons with below regex
const URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}?((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})(?:\\:${ALLOWED_PORTS}|\\b|(?=_))(?!@(?:[a-z\\d-]+\\.)+[a-z]{2,})`;
This would prevent catastrophic backtracking issue with TLD regex.
Then memoize the defaultValue prop using React.useMemo, so that parser.replace gets executed only when description changes.
const defaultValue = useMemo(() => parser.htmlToMarkdown(parser.replace(taskDescription)), [taskDescription]);
Also the useEffect should get executed only when the screen is focused. Update the logic to prevent unwanted execution offscreen.
const isFocused = useIsFocused();
useEffect(() => {
// Update state only if screen is focused
if (isFocused) {
setTaskTitle(props.task.title);
setTaskDescription(parser.htmlToMarkdown(parser.replace(props.task.description || '')));
}
}, [props.task.title, props.task.description, isFocused]);
Result
Here is the CPU profile after applying the fix. Overall time spent in parser.replace reduced to 2s from 6.3s.
The other area to focus on is actually optimising the parser.replace logic in ExpensiMark library. However it is in Expensify-Common repository and possibly shared by multiple apps.
@ntdiary looks like we've got some proposals for this one. Can you take a look when you get a chance?
Ah, sorry for the delay, I missed this issue, will review soon.
Thank you all for the proposals. Although the related regular expressions are very complex, I still believe that optimizing them is the fundamental solution. Even with the use of useMemo
, I feel that a 2s delay is not ideal for the user. :)
π£ It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? πΈ
Inline code is a valid input, and we support markdown syntax not only on this page, but also on other pages. So we should optimize the execution efficiency of the relevant regular expressions.
Hey @ntdiary, I was of the assumption that we only show inline code on page load, as I did not see other components parsing the updated description text. Please let me know if that's not the correct behavior.
Not only the description text, but also every case that calls parser.replace
may experience this delay. :)
I agree then, we must think of some way to optimize this regex. :)
@ntdiary The real culprit is the huge tlds regex that's being used in MARKDOWN_URL_REGEX
. To test the theory I replaced the TLD_REGEX
const with a much simpler regex which simply validates if the string is a valid tlds
instead of explicitly matching specific TLDs.
const TLD_REGEX='(?:[a-zA-Z]{2,63}|xn--[a-zA-Z0-9-]{4,59})';
And here are the results for a single execution of parser.replace(). With original TLD_REGEX execution time is approximately 1s where as for simpler regex it is 10ms. Also it is important to note that the execution time is directly proportional to the input length. If I copy over the same test input 4 times, execution time with original regex becomes 4 seconds. That's 1s for around 350 characters. Which is quite huge!
Before | After |
---|---|
So it boils down to the decision whether we can trim down the specific TLD names with in the TLD_REGEX.
@ntdiary any thoughts on the suggestions?
Thank you all for the discussion, but I haven't found a completely satisfactory solution yet.
It seems that the 'early return' approach doesn't address the performance issues with URL_WEBSITE_REGEX
.
As for redefining TLD_REGEX
as (?:[a-zA-Z]{2,63}|xn--[a-zA-Z0-9-]{4,59})
, it fixes the specific case in the OP, but it has clear side effect, identifying many characters as url:
In fact, the character sequence causing performance issues in the OP looks like this: event.clipboardData.files
,
and the corresponding pattern is this part of URL_WEBSITE_REGEX
: (?:[-a-z0-9]*[a-z0-9])?\\.)+(?:${TLD_REGEX}
The related PR is https://github.com/Expensify/expensify-common/pull/523.
A simple fix would be to split the hyphen check into a subsequent conditional validation, but I need to ask on Slack to see if this is appropriate.
@ntdiary I was not proposing to redefine TLD_REGEX as (?:[a-zA-Z]{2,63}|xn--[a-zA-Z0-9-]{4,59})
. Just pointing out that usage of a smaller regex significantly reduces execution time. As it stands I don't have a solid proposal at the moment without considerably changing the auto linking patterns.
However I do like to point out that we do have a very liberal definition for what a valid url is. Any set of words separated by dots and ending with a valid TLD is considered as a valid url as per MARKDOWN_LINK_REGEX. That's why we need to rely on explicit TLD names to parse links in markdown. Because most set of words pass through the initial set of conditionals only to fail ultimately at the specific TLD check. That's why the simplified regex matched any two words separated by dots as links.
Also the issue seems to impact only Android & iOS platforms. mWeb seems to be largely unaffected. So I suspect Hermes to be the culprit.
@ntdiary I was not proposing to redefine TLD_REGEX as
(?:[a-zA-Z]{2,63}|xn--[a-zA-Z0-9-]{4,59})
. Just pointing out that usage of a smaller regex significantly reduces execution time. As it stands I don't have a solid proposal at the moment without considerably changing the auto linking patterns.However I do like to point out that we do have a very liberal definition for what a valid url is. Any set of words separated by dots and ending with a valid TLD is considered as a valid url as per MARKDOWN_LINK_REGEX. That's why we need to rely on explicit TLD names to parse links in markdown. Because most set of words pass through the initial set of conditionals only to fail ultimately at the specific TLD check. That's why the simplified regex matched any two words separated by dots as links.
Also the issue seems to impact only Android & iOS platforms. mWeb seems to be largely unaffected. So I suspect Hermes to be the culprit.
@aswin-s, Hermes regex engine is one part of the root cause, for the sequence in the OP (i.e., event.clipboardData.files
), its execution efficiency is lower compared to other platforms. Also, this is actually a catastrophic backtracking problem, other platforms may also encounter similar issues, this is something we need to pay attention to when writing regex in the future.
For this issue, another approach is preventing backtracking, such as:
((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})
Some reference links: https://expensify.slack.com/archives/C01GTK53T8Q/p1689024254320539 https://javascript.info/regexp-catastrophic-backtracking
A simple fix would be to split the hyphen check into a subsequent conditional validation, but I need to ask on Slack to see if this is appropriate.
@ntdiary The below regex adds conditional validation to URL_WEBSITE_REGEX for whether to use the double hypen TLD_REGEX or single hypen regex URL_HYPEN_REGEX
(?(?=[-]{2})(?:${TLD_REGEX})|(?:${URL_HYPEN_REGEX})+)
The ?= regex checks for double hypens i.e XN--NQV7FS00EMA|XN--MGBC0A9AZCG and if so run the TLD_REGEX otherwise run the single hypen regex URL_HYPEN_REGEX.
The single hypen regex can be extracted from URL_WEBSITE_REGEX into const,
const URL_HYPEN_REGEX = '[-a-z0-9]*[a-z0-9])?\\.';
The URL_WEBSITE_REGEX with conditional validation
const URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}?((?:www\\.)?[a-z0-9](?(?=[-]{2})(?:${TLD_REGEX})|(?:${URL_HYPEN_REGEX})+)(?:\\:${ALLOWED_PORTS}|\\b|(?=_))(?!@(?:[a-z\\d-]+\\.)+[a-z]{2,})`;
But need to confirm if condition validation would be acceptable approach.
The hyphen check I mentioned refers to the changes introduced in PR https://github.com/Expensify/expensify-common/pull/523.
The simple fix I mentioned is to revert their PR, and then in subsequent functions (such as modifyTextForUrlLinks
), validate that if the matched url has an invalid hyphen format (e.g., -abc.com
, abc-.com
), then do not perform the replacement.
π£ It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? πΈ
@ntdiary I have revised the proposal and included the approach to revert the identified PR and also included POC.
I need some time to understand your hyphen check to ensure there won't be any regressions. :)
@garrettmknight, @ntdiary Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
@ntdiary when do you think you'll get a chance to review more deeply?
Investigating. :)
So far, I think the regex I proposed in this comment would be better, i.e., ((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})
. Also, since @aswin-s was the first to point out the performance issue with parser.replace
, so I think we can assign they to raise a PR in expensify-common
and apply this pattern.
As for this repository, we can upgrade the expensify-common
version after the PR is merged. BTW, the execution time of the new regex is approximately 10ms. If necessary, we can still apply the useMemo
part of @aswin-s's proposal
π π π C+ reviewed
Triggered auto assignment to @AndrewGable, see https://stackoverflow.com/c/expensify/questions/7972 for more details.
Reverting the PR would be step 1 and then the URL_WEBSITE_REGEX can be replaced with
((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})
Nope, If we adopt this pattern (i.e. ((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})
), there's no need to revert that PR https://github.com/Expensify/expensify-common/pull/523. :)
@garrettmknight @AndrewGable @ntdiary this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!
@ntdiary the
((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})
could be used along with early return in case of match. I suggest splitting the bounty and get more reviewers as the regex requires more thorough testing.
I don't think early return is necessary. The crux of this issue lies in the catastrophic backtracking problem when regexp.exec
is executed, and the suggested regex has already addressed this problem. BTW, @aswin-s, do you have any thoughts on the new pattern?
@garrettmknight, @AndrewGable, @ntdiary Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
Waiting for @aswin-s 's thoughts.
Sorry missed this. I'll get back to this tomorrow.
@ntdiary I can confirm that the new pattern fixes the issue and also passes all test cases on expensify-commons
.
((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})
@garrettmknight @AndrewGable @ntdiary this issue is now 4 weeks old and preventing us from maintaining WAQ, can you:
Thanks!
Current assignee @ntdiary is eligible for the Internal assigner, not assigning anyone new.
@aswin-s do you want to take a swing at updating your proposal before we take this one internal?
Current assignee @ntdiary is eligible for the External assigner, not assigning anyone new.
@garrettmknight I've updated my proposal with the regex modification in expensify-commons.
Awesome, thanks! @ntdiary mind giving that another review?
Awesome, thanks! @ntdiary mind giving that another review?
@garrettmknight, sure, my opinion is same as the previous approval comment, and then let's start the first PR in expensify-common. :)
cc @AndrewGable
Ah, my mistake - @AndrewGable looks like you're up.
Andrew is out for a few days - I don't think we need to rush fixing this one so I think we can wait on him returning for review.
π£ @ntdiary π An offer has been automatically sent to your Upwork account for the Reviewer role π Thanks for contributing to the Expensify app!
π£ @aswin-s π An offer has been automatically sent to your Upwork account for the Contributor role π Thanks for contributing to the Expensify app!
Offer link Upwork job Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review π§βπ» Keep in mind: Code of Conduct | Contributing π
I'm OOO this week, will be back 2/21 to pick it back up anything we need from BZ.
Raised first PR in expensify-commons repo : https://github.com/Expensify/expensify-common/pull/651.
@garrettmknight, @AndrewGable, @aswin-s, @ntdiary Whoops! This issue is 2 days overdue. Let's get this updated quick!
Merged Expensify common PR
Raised followup PR in Expensify/App repo: https://github.com/Expensify/App/pull/36819
If you havenβt already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Version Number: 1.4.24-0 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Expensify/Expensify Issue URL: Issue reported by: Applause internal team Slack conversation:
Action Performed:
Expected Result:
Able to create a task without significant delay
Actual Result:
The app lags after pasting the code block in the description field
Workaround:
unknown
Platforms:
Which of our officially supported platforms is this issue occurring on?
Screenshots/Videos
Add any screenshot/video evidence
https://github.com/Expensify/App/assets/43996225/24998581-9572-4d15-b18f-d5cacaa98ff7
View all open jobs on GitHub
Upwork Automation - Do Not Edit