[HOLD for payment 2024-04-01] [$250] Android - App lags after adding code block in task description

kavimuru commented 10 months ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!

Version Number: 1.4.24-0 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Expensify/Expensify Issue URL: Issue reported by: Applause internal team Slack conversation:

Action Performed:

Open the app and log in
Tap the FAB > Assign Task
Enter title
Add the code block in the task description: if (files.length > 0) { // Prevent the default so we do not post the file name into the text box event.preventDefault(); this.props.onPasteFile(event.clipboardData.files[0]); return; if (files.length > 0) { // Prevent the default so we do not post the file name into the text box event.preventDefault() this.props.onPasteFile(event.clipboardData.files[0]); return;
Proceed to the next screen and finish creating the task
Expected Result:

Able to create a task without significant delay

Actual Result:

The app lags after pasting the code block in the description field

Workaround:

unknown

Platforms:

Which of our officially supported platforms is this issue occurring on?

[x] Android: Native
[ ] Android: mWeb Chrome
[ ] iOS: Native
[ ] iOS: mWeb Safari
[ ] MacOS: Chrome / Safari
[ ] MacOS: Desktop

Screenshots/Videos

Add any screenshot/video evidence

https://github.com/Expensify/App/assets/43996225/24998581-9572-4d15-b18f-d5cacaa98ff7

View all open jobs on GitHub

Upwork Automation - Do Not Edit

Upwork Job URL: https://www.upwork.com/jobs/~013b44c7d5e9e29617
Upwork Job ID: 1745374242762850304
Last Price Increase: 2024-03-15
Automatic offers:

ntdiary | Reviewer | 0
aswin-s | Contributor | 0

melvin-bot[bot] commented 10 months ago

Job added to Upwork: https://www.upwork.com/jobs/~013b44c7d5e9e29617

melvin-bot[bot] commented 10 months ago

Triggered auto assignment to @garrettmknight (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

melvin-bot[bot] commented 10 months ago

Triggered auto assignment to Contributor-plus team member for initial proposal review - @ntdiary (External)

garrettmknight commented 10 months ago

Waiting on proposals

sfurqan2 commented 10 months ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

The app lags when code is entered in description textbox

What is the root cause of that problem?

From what I have investigated so far, I think default value in the component is parsing the text entered into the textbox. When we enter the code above, I think the parser takes longer time to render which causes the lag. https://github.com/Expensify/App/blob/8e2fb675389bb84deee4b8e001a536252883e72f/src/pages/tasks/NewTaskDetailsPage.js#L116

What changes do you think we should make in order to solve the problem?

I think we should remove parser from default value attribute in text input and move it to where "task description" state default value is set.

What alternative solutions did you explore? (Optional)

NA

aswin-s commented 10 months ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

Adding code block in task description makes the page laggy.

What is the root cause of that problem?

PR #29144 modified the task description input logic to support mark down. Task description is parsed to mark down before saving and vice versa when setting the default value to input.

However the parsing logic which uses the method parser.replace() is computationally intensive.

CPU Profiling above shows modifyTextForUrlLinks, which gets invoked by parser.replace taking around 6 seconds to execute when transitioning from New Task to Confirm task page.

There are 3 locations in Task creation where parser.replace is used.

https://github.com/Expensify/App/blob/8ba2782c8c35a394a3e1f8d8406085d7e1b49215/src/pages/tasks/NewTaskDetailsPage.js#L116

https://github.com/Expensify/App/blob/8ba2782c8c35a394a3e1f8d8406085d7e1b49215/src/pages/tasks/NewTaskDescriptionPage.js#L72

https://github.com/Expensify/App/blob/8ba2782c8c35a394a3e1f8d8406085d7e1b49215/src/pages/tasks/TaskDescriptionPage.js#L120

Note that defaultValue prop is directly assigned a function call instead of a memoized value. This causes the computationally intensive method to be invoked whenever the page state changes. This in turn causes the JS bridge to get blocked and animations to lag.

Also the useEffect here is dependent on props.task.

https://github.com/Expensify/App/blob/d4c28a4b95eb0a744a26fc2756fc2a42bb744f12/src/pages/tasks/NewTaskDetailsPage.js#L46-L49

When navigating from NewTaskDetailPage to NewTaskPage(ConfirmTask) screen , the first screen is not unmounted by ReactNavigation. It keeps the page in memory for navigating back. This causes the useEffect to get executed when task object is updated in Onyx. Which also invokes the parser.replace method.

So overall a computationally intensive method gets unnecessarily executed multiple times causing the app to lag.

What changes do you think we should make in order to solve the problem?

Update URL_WEBSITE_REGEX in expensify-commons with below regex

https://github.com/Expensify/expensify-common/blob/6ae333c38ca5909245c6310db1dc66728d62d0ed/lib/Url.js#L5

const URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}?((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})(?:\\:${ALLOWED_PORTS}|\\b|(?=_))(?!@(?:[a-z\\d-]+\\.)+[a-z]{2,})`;

This would prevent catastrophic backtracking issue with TLD regex.

Then memoize the defaultValue prop using React.useMemo, so that parser.replace gets executed only when description changes.


const defaultValue = useMemo(() => parser.htmlToMarkdown(parser.replace(taskDescription)), [taskDescription]);

Also the useEffect should get executed only when the screen is focused. Update the logic to prevent unwanted execution offscreen.


const isFocused = useIsFocused();

useEffect(() => {
  // Update state only if screen is focused
  if (isFocused) {        
    setTaskTitle(props.task.title);
    setTaskDescription(parser.htmlToMarkdown(parser.replace(props.task.description || '')));
  }
}, [props.task.title, props.task.description, isFocused]);

Result

Here is the CPU profile after applying the fix. Overall time spent in parser.replace reduced to 2s from 6.3s.

What alternative solutions did you explore? (Optional)

The other area to focus on is actually optimising the parser.replace logic in ExpensiMark library. However it is in Expensify-Common repository and possibly shared by multiple apps.

garrettmknight commented 10 months ago

@ntdiary looks like we've got some proposals for this one. Can you take a look when you get a chance?

ntdiary commented 10 months ago

Ah, sorry for the delay, I missed this issue, will review soon.

ntdiary commented 10 months ago

Thank you all for the proposals. Although the related regular expressions are very complex, I still believe that optimizing them is the fundamental solution. Even with the use of useMemo, I feel that a 2s delay is not ideal for the user. :)

https://github.com/Expensify/expensify-common/blob/b682bb4078caaffa2f54ab8c35dd76178e322e11/lib/ExpensiMark.js#L183

melvin-bot[bot] commented 10 months ago

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

ntdiary commented 10 months ago

Inline code is a valid input, and we support markdown syntax not only on this page, but also on other pages. So we should optimize the execution efficiency of the relevant regular expressions.

sfurqan2 commented 10 months ago

Hey @ntdiary, I was of the assumption that we only show inline code on page load, as I did not see other components parsing the updated description text. Please let me know if that's not the correct behavior.

ntdiary commented 10 months ago

Not only the description text, but also every case that calls parser.replace may experience this delay. :)

sfurqan2 commented 10 months ago

I agree then, we must think of some way to optimize this regex. :)

aswin-s commented 10 months ago

@ntdiary The real culprit is the huge tlds regex that's being used in MARKDOWN_URL_REGEX. To test the theory I replaced the TLD_REGEX const with a much simpler regex which simply validates if the string is a valid tlds instead of explicitly matching specific TLDs.

const TLD_REGEX='(?:[a-zA-Z]{2,63}|xn--[a-zA-Z0-9-]{4,59})';

And here are the results for a single execution of parser.replace(). With original TLD_REGEX execution time is approximately 1s where as for simpler regex it is 10ms. Also it is important to note that the execution time is directly proportional to the input length. If I copy over the same test input 4 times, execution time with original regex becomes 4 seconds. That's 1s for around 350 characters. Which is quite huge!

Before	After

So it boils down to the decision whether we can trim down the specific TLD names with in the TLD_REGEX.

garrettmknight commented 9 months ago

@ntdiary any thoughts on the suggestions?

ntdiary commented 9 months ago

Thank you all for the discussion, but I haven't found a completely satisfactory solution yet.

It seems that the 'early return' approach doesn't address the performance issues with URL_WEBSITE_REGEX.

As for redefining TLD_REGEX as (?:[a-zA-Z]{2,63}|xn--[a-zA-Z0-9-]{4,59}), it fixes the specific case in the OP, but it has clear side effect, identifying many characters as url:

In fact, the character sequence causing performance issues in the OP looks like this: event.clipboardData.files, and the corresponding pattern is this part of URL_WEBSITE_REGEX: (?:[-a-z0-9]*[a-z0-9])?\\.)+(?:${TLD_REGEX}

A simple fix would be to split the hyphen check into a subsequent conditional validation, but I need to ask on Slack to see if this is appropriate.

aswin-s commented 9 months ago

@ntdiary I was not proposing to redefine TLD_REGEX as (?:[a-zA-Z]{2,63}|xn--[a-zA-Z0-9-]{4,59}). Just pointing out that usage of a smaller regex significantly reduces execution time. As it stands I don't have a solid proposal at the moment without considerably changing the auto linking patterns.

However I do like to point out that we do have a very liberal definition for what a valid url is. Any set of words separated by dots and ending with a valid TLD is considered as a valid url as per MARKDOWN_LINK_REGEX. That's why we need to rely on explicit TLD names to parse links in markdown. Because most set of words pass through the initial set of conditionals only to fail ultimately at the specific TLD check. That's why the simplified regex matched any two words separated by dots as links.

Also the issue seems to impact only Android & iOS platforms. mWeb seems to be largely unaffected. So I suspect Hermes to be the culprit.

ntdiary commented 9 months ago

@ntdiary I was not proposing to redefine TLD_REGEX as (?:[a-zA-Z]{2,63}|xn--[a-zA-Z0-9-]{4,59}). Just pointing out that usage of a smaller regex significantly reduces execution time. As it stands I don't have a solid proposal at the moment without considerably changing the auto linking patterns.

However I do like to point out that we do have a very liberal definition for what a valid url is. Any set of words separated by dots and ending with a valid TLD is considered as a valid url as per MARKDOWN_LINK_REGEX. That's why we need to rely on explicit TLD names to parse links in markdown. Because most set of words pass through the initial set of conditionals only to fail ultimately at the specific TLD check. That's why the simplified regex matched any two words separated by dots as links.

Also the issue seems to impact only Android & iOS platforms. mWeb seems to be largely unaffected. So I suspect Hermes to be the culprit.

@aswin-s, Hermes regex engine is one part of the root cause, for the sequence in the OP (i.e., event.clipboardData.files), its execution efficiency is lower compared to other platforms. Also, this is actually a catastrophic backtracking problem, other platforms may also encounter similar issues, this is something we need to pay attention to when writing regex in the future.

For this issue, another approach is preventing backtracking, such as: ((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})

Some reference links: https://expensify.slack.com/archives/C01GTK53T8Q/p1689024254320539 https://javascript.info/regexp-catastrophic-backtracking

ntdiary commented 9 months ago

A simple fix would be to split the hyphen check into a subsequent conditional validation, but I need to ask on Slack to see if this is appropriate.

@ntdiary The below regex adds conditional validation to URL_WEBSITE_REGEX for whether to use the double hypen TLD_REGEX or single hypen regex URL_HYPEN_REGEX
(?(?=[-]{2})(?:${TLD_REGEX})|(?:${URL_HYPEN_REGEX})+)
The ?= regex checks for double hypens i.e XN--NQV7FS00EMA|XN--MGBC0A9AZCG and if so run the TLD_REGEX otherwise run the single hypen regex URL_HYPEN_REGEX.

The single hypen regex can be extracted from URL_WEBSITE_REGEX into const,
const URL_HYPEN_REGEX = '[-a-z0-9]*[a-z0-9])?\\.';
The URL_WEBSITE_REGEX with conditional validation
const URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}?((?:www\\.)?[a-z0-9](?(?=[-]{2})(?:${TLD_REGEX})|(?:${URL_HYPEN_REGEX})+)(?:\\:${ALLOWED_PORTS}|\\b|(?=_))(?!@(?:[a-z\\d-]+\\.)+[a-z]{2,})`;
But need to confirm if condition validation would be acceptable approach.

The hyphen check I mentioned refers to the changes introduced in PR https://github.com/Expensify/expensify-common/pull/523. The simple fix I mentioned is to revert their PR, and then in subsequent functions (such as modifyTextForUrlLinks), validate that if the matched url has an invalid hyphen format (e.g., -abc.com, abc-.com), then do not perform the replacement.

melvin-bot[bot] commented 9 months ago

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

ntdiary commented 9 months ago

@ntdiary I have revised the proposal and included the approach to revert the identified PR and also included POC.

I need some time to understand your hyphen check to ensure there won't be any regressions. :)

melvin-bot[bot] commented 9 months ago

@garrettmknight, @ntdiary Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

garrettmknight commented 9 months ago

@ntdiary when do you think you'll get a chance to review more deeply?

ntdiary commented 9 months ago

Investigating. :)

ntdiary commented 9 months ago

So far, I think the regex I proposed in this comment would be better, i.e., ((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX}). Also, since @aswin-s was the first to point out the performance issue with parser.replace, so I think we can assign they to raise a PR in expensify-common and apply this pattern.

As for this repository, we can upgrade the expensify-common version after the PR is merged. BTW, the execution time of the new regex is approximately 10ms. If necessary, we can still apply the useMemo part of @aswin-s's proposal

🎀 👀 🎀 C+ reviewed

melvin-bot[bot] commented 9 months ago

Triggered auto assignment to @AndrewGable, see https://stackoverflow.com/c/expensify/questions/7972 for more details.

ntdiary commented 9 months ago

Reverting the PR would be step 1 and then the URL_WEBSITE_REGEX can be replaced with ((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})

Nope, If we adopt this pattern (i.e. ((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})), there's no need to revert that PR https://github.com/Expensify/expensify-common/pull/523. :)

melvin-bot[bot] commented 9 months ago

@garrettmknight @AndrewGable @ntdiary this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!

ntdiary commented 9 months ago

@ntdiary the ((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX}) could be used along with early return in case of match. I suggest splitting the bounty and get more reviewers as the regex requires more thorough testing.

I don't think early return is necessary. The crux of this issue lies in the catastrophic backtracking problem when regexp.exec is executed, and the suggested regex has already addressed this problem. BTW, @aswin-s, do you have any thoughts on the new pattern?

melvin-bot[bot] commented 9 months ago

@garrettmknight, @AndrewGable, @ntdiary Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

garrettmknight commented 9 months ago

Waiting for @aswin-s 's thoughts.

aswin-s commented 9 months ago

Sorry missed this. I'll get back to this tomorrow.

aswin-s commented 9 months ago

@ntdiary I can confirm that the new pattern fixes the issue and also passes all test cases on expensify-commons.

((?:www\\.)?[a-z0-9](?=(?<label>[-a-z0-9]*[a-z0-9]))\\k<label>?\\.)+(?:${TLD_REGEX})

melvin-bot[bot] commented 9 months ago

@garrettmknight @AndrewGable @ntdiary this issue is now 4 weeks old and preventing us from maintaining WAQ, can you:

Decide whether any proposals currently meet our guidelines and can be approved as-is today
If no proposals meet that standard, please take this issue internal and treat it as one of your highest priorities
If you have any questions, don't hesitate to start a discussion in #expensify-open-source

Thanks!

melvin-bot[bot] commented 9 months ago

Current assignee @ntdiary is eligible for the Internal assigner, not assigning anyone new.

garrettmknight commented 9 months ago

@aswin-s do you want to take a swing at updating your proposal before we take this one internal?

melvin-bot[bot] commented 9 months ago

Current assignee @ntdiary is eligible for the External assigner, not assigning anyone new.

aswin-s commented 9 months ago

@garrettmknight I've updated my proposal with the regex modification in expensify-commons.

garrettmknight commented 9 months ago

Awesome, thanks! @ntdiary mind giving that another review?

ntdiary commented 9 months ago

Awesome, thanks! @ntdiary mind giving that another review?

@garrettmknight, sure, my opinion is same as the previous approval comment, and then let's start the first PR in expensify-common. :)

cc @AndrewGable

garrettmknight commented 9 months ago

Ah, my mistake - @AndrewGable looks like you're up.

garrettmknight commented 9 months ago

Andrew is out for a few days - I don't think we need to rush fixing this one so I think we can wait on him returning for review.

melvin-bot[bot] commented 9 months ago

📣 @ntdiary 🎉 An offer has been automatically sent to your Upwork account for the Reviewer role 🎉 Thanks for contributing to the Expensify app!

Offer link Upwork job

melvin-bot[bot] commented 9 months ago

📣 @aswin-s 🎉 An offer has been automatically sent to your Upwork account for the Contributor role 🎉 Thanks for contributing to the Expensify app!

Offer link Upwork job Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review 🧑‍💻 Keep in mind: Code of Conduct | Contributing 📖

garrettmknight commented 9 months ago

I'm OOO this week, will be back 2/21 to pick it back up anything we need from BZ.

aswin-s commented 9 months ago

Raised first PR in expensify-commons repo : https://github.com/Expensify/expensify-common/pull/651.

melvin-bot[bot] commented 9 months ago

@garrettmknight, @AndrewGable, @aswin-s, @ntdiary Whoops! This issue is 2 days overdue. Let's get this updated quick!

AndrewGable commented 9 months ago

Merged Expensify common PR

aswin-s commented 9 months ago

Raised followup PR in Expensify/App repo: https://github.com/Expensify/App/pull/36819

Expensify / App