Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.58k stars 2.92k forks source link

[$500] ExpensiMark: large bundle size #31183

Closed tomekzaw closed 9 months ago

tomekzaw commented 1 year ago

Problem A simple script that uses ExpensiMark library to parse Markdown into HTML has a bundle size of 336 kB (or 184 kB minified) which is quite big.

Details The file is large because it contains mappings of all HTML entities (from _.escape) and countries, states, currencies etc. from CONST.jsx.

Solution Investigate if it's possible to minimize the bundle size.

Steps to reproduce

index.js

import ExpensiMark from "expensify-common/lib/ExpensiMark";

const input = "Hello, *world*!";
const parser = new ExpensiMark();
const output = parser.replace(input);

console.log(JSON.stringify(input));
console.log(JSON.stringify(output));
esbuild index.js --bundle --minify --outfile=out.js

out.js: https://pastebin.swmansion.com/?9798f33833cc33b6#DRgiwxs8ujteamHaGDioSzZVZ83nuWUim3QabJ5PUYed

Issue OwnerCurrent Issue Owner: @Skalakid
Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~0155a63860a646579c
  • Upwork Job ID: 1740457868647858176
  • Last Price Increase: 2024-01-04
  • Automatic offers:
    • fedirjh | Reviewer | 28090032
    • aswin-s | Contributor | 28099676
melvin-bot[bot] commented 1 year ago

Triggered auto assignment to @abekkala (NewFeature), see https://stackoverflowteams.com/c/expensify/questions/14418#:~:text=BugZero%20process%20steps%20for%20feature%20requests for more details.

roryabraham commented 1 year ago

Co-assigning myself to supervise, but this will be worked on by Michał Skałka from SWM, who will be invoicing us separately

roryabraham commented 1 year ago

Maybe the first and easiest steps here are to:

  1. Port Expensimark to typescript and stop using underscore – that would likely help reduce the file size
  2. Split up the big CONST export and separately export/import only the pieces that we need in Expensimark
Skalakid commented 1 year ago

Hi I’m Michał from Software Mansion, an expert agency, and I’d like to work on this issue

roryabraham commented 1 year ago

No update from me, but @Skalakid keep me posted if you're starting to work on this

roryabraham commented 1 year ago

not urgent, going to make @Skalakid the issue owner here

melvin-bot[bot] commented 11 months ago

This issue has not been updated in over 15 days. @Skalakid, @roryabraham eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!

roryabraham commented 11 months ago

Moving this back to weekly. We are hoping to launch this feature in January, so we should start making moves here too. @Skalakid is going to be focused on inline video, so we should find someone else to tag in on this.

tomekzaw commented 11 months ago

Would be great to have this for V1 but currently JS bundle of Live Markdown parser is 188 kB so it's not a blocker I guess.

roryabraham commented 11 months ago

I'm going to make this external and accept proposals for open-source contributors to work on this.

melvin-bot[bot] commented 11 months ago

Job added to Upwork: https://www.upwork.com/jobs/~0155a63860a646579c

melvin-bot[bot] commented 11 months ago

Triggered auto assignment to Contributor-plus team member for initial proposal review - @fedirjh (External)

roryabraham commented 11 months ago

Hi contributors 👋🏼

I've left some suggestions for how to improve this, but the approach can be open-ended. If multiple contributors have different ideas too, we could potentially hire more than one (in that case, it would probably be simplest to create separate issues to track the work separately).

Also note that the expensify-common codebase must remain compatible with vanilla (non-typescript) js code so that it can continue to be used in our legacy javascript codebases.

roryabraham commented 11 months ago

cc @kidroca you're someone I had in mind for this issue

akovacs commented 11 months ago

There appears to be significant redundancy in https://github.com/Expensify/expensify-common/blob/main/lib/CONST.jsx

A few ideas:

  1. Would it be possible to generate the STATES object dynamically, eg:

const STATES_DATA = {AK: 'Alaska', AL: 'Alabama', ...};

const STATES = Object.fromEntries(Object.entries(STATES_DATA).map(([k, v]) => [k, {stateISO: k, stateName: v}]));

https://github.com/Expensify/expensify-common/blob/6df214a20d60664b3f00013ddbbdea99f72240d2/lib/CONST.jsx#L62

  1. A similar approach could be used for generating the_INTEGRATIONS objects.

https://github.com/Expensify/expensify-common/blob/6df214a20d60664b3f00013ddbbdea99f72240d2/lib/CONST.jsx#L564

  1. The hyperlink validation logic is very complex, is this necessary?

https://github.com/Expensify/expensify-common/blob/6df214a20d60664b3f00013ddbbdea99f72240d2/lib/CONST.jsx#L340

  1. Would it be possible to consolidate the email-related regexes? eg: EXTRACT_EMAIL, EMAIL_SEARCH do the same thing.

https://github.com/Expensify/expensify-common/blob/6df214a20d60664b3f00013ddbbdea99f72240d2/lib/CONST.jsx#L283

What do you think of these approaches? Thank you!

melvin-bot[bot] commented 11 months ago

📣 @akovacs! 📣 Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:

  1. Make sure you've read and understood the contributing guidelines.
  2. Get the email address used to login to your Expensify account. If you don't already have an Expensify account, create one here. If you have multiple accounts (e.g. one for testing), please use your main account email.
  3. Get the link to your Upwork profile. It's necessary because we only pay via Upwork. You can access it by logging in, and then clicking on your name. It'll look like this. If you don't already have an account, sign up for one here.
  4. Copy the format below and paste it in a comment on this issue. Replace the placeholder text with your actual details. Screen Shot 2022-11-16 at 4 42 54 PM Format:
    Contributor details
    Your Expensify account email: <REPLACE EMAIL HERE>
    Upwork Profile Link: <REPLACE LINK HERE>
roryabraham commented 11 months ago

The hyperlink validation logic is very complex, is this necessary?

It evolved as the result of fixing many different bugs people found over the years, so my hunch is it would be very high-risk for a low reward to try and update that.

roryabraham commented 11 months ago

I think eliminating redundancies in CONST makes sense, but I doubt it will have a very big impact.

For proposals going forward, please include an educated estimate of the impact the change will have on the ExpensiMark bundle size

roryabraham commented 11 months ago

Maybe another avenue for investigation would be to make sure that we're using a minified bundle.

akovacs commented 11 months ago

Thank you for your feedback!

A few strategies that I could think of and estimated savings:

  1. Generate URLs like {g_cloudFrontImg}icons/export-icons/zenefit programmatically:

8 direct integrations 3 URLs 43 url length = 1032 bytes

13 URLs in indirect integrations * 43 url length = 559 bytes

  1. Generate STATE object programmatically: 17 savings per state * 50 states = 850 bytes

  2. Generate EXPENSIFY_EMAILS by concatenating new entries to INVALID_APPROVER_AND_SHAREE_EMAILS: ~ 20 entries * 20 savings per entry = 400 bytes

Hardcode image: ${g_cloudFrontImg}icons/accounting-other--blue.svg for DEFAULT_IS_TEMPLATES: 60 * 8 = 480 chars

reduce redundancy of "https://d2k5nsl2zxldvw.cloudfront.net" 40 chars * 4 instances = 160 bytes

= 3481 byte savings in plaintext. I think these savings will carry over to post-minification as well since as far as I understand, that only mangles variable names and not values. But I will only be able to test this in about 11 hours.

There are also ways to perform deadcode elimination such as https://www.npmjs.com/package/babel-plugin-minify-dead-code-elimination but that is significantly more risky, therefore I am hesitant to attempt that.

roryabraham commented 11 months ago

thanks for talking that through a bit better @akovacs. ~3.5 kB / ~188 kB is a reduction of about only about 2%, so unfortunately I don't think that those changes are going to be very impactful.

aswin-s commented 11 months ago

Based on bundle analysis 50% of the bundle is html-entities library. named-references to be exact. CONST.jsx contributes just 12kb to the whole bundle. So shouldn't we focus more on how to optimise the usage of html-entities?

image

However a polyfill of String.prototype.replaceAll is embedded in Str file. I guess this was for Node 14 compatibility and no longer needed. Removing this polyfill itself reduces 30kb from entire bundle resulting in a 16% savings.

roryabraham commented 11 months ago

👍🏼 to removing the Str.prototype.replaceAll polyfill, that seems like an easy win as it's definitely supported by all modern browsers / JS engines that we care to support.

aswin-s commented 11 months ago

Found another opportunity to reduce bundle size. ExpensifyMark.js and str.js uses below methods from underscore.

union 
isEmpty
filter
contains
map
difference
isFunction
first
last   

These could be replaced with pure ES6 methods. That shaves off another 20kb, bringing down the minified bundle size to approximately 129kb.

roryabraham commented 11 months ago

@aswin-s that sounds good, but using underscore is part of our JS style guidelines. However, not using underscore is part of our TypeScript style guidelines. That's why I suggested above re-implementing certain files in TS and dropping underscore/lodash

melvin-bot[bot] commented 11 months ago

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

kidroca commented 11 months ago

I agree with the insights shared thus far. Here are my thoughts on the key issues:

The Predominant Impact of html-entities (Approximately 50%)

Given that ExpensiMark extensively utilizes the html-entities library through Str for encoding and decoding, the room for optimization might be limited. We're already on the latest version of html-entities. The alternatives, such as replacing or redeveloping it, pose significant risks and complexities, which may not justify the potential gains in bundle size reduction.

Potential Extraction from CONST (Roughly 7%)

Extracting elements out of CONST seems relatively straightforward. However, it's worth noting that the impact here is minimal – we're looking at a size reduction of about 5-7%. While it's a step in the right direction, it won't be a game-changer for the overall bundle size.

Addressing underscore Usage (Approximately 12%)

The suggestion to eliminate underscore usage from the file is viable, considering its limited use. However, an alternative approach could be more beneficial:

If the projects importing expensify-common use the same or a compatible version of underscore, there would be no additional burden on the bundle size. Rather than removing underscore, ensuring that dependent projects are aligned with the latest version could be a more effective strategy. This approach maintains functionality while optimizing for size.

Minification Considerations

Currently, the expensify-common project is distributed as pure source code, not as a pre-bundled library. This approach delegates the responsibilities of bundling, tree shaking, and minification to the consumer project. A potential area for exploration and improvement could be to update expensify-common to be distributed as a bundled library. This change could lead to more uniform consumption and potentially more efficient optimization processes across different projects.

roryabraham commented 10 months ago

@kidroca I like the idea to just align the underscore version used in expensify-common and E/App. That seems like a very easy way to get a 12% bundle size reduction without much effort.

melvin-bot[bot] commented 10 months ago

📣 @fedirjh 🎉 An offer has been automatically sent to your Upwork account for the Reviewer role 🎉 Thanks for contributing to the Expensify app!

Offer link Upwork job

kidroca commented 10 months ago

@kidroca I like the idea to just align the underscore version used in expensify-common and E/App. That seems like a very easy way to get a 12% bundle size reduction without much effort.

It appears that E/App is already aligned with expensify-common in terms of the underscore version used:

Given this setup, for broader compatibility with other projects, we could consider moving underscore to peerDependencies in expensify-common. This move would allow us to declare a wider range of acceptable versions, effectively bridging any version gaps among different projects.

Alternatively, if placing underscore under peerDependencies poses challenges, we could simply adjust the version specification in expensify-common to use the caret (^) prefix. This change would ensure that any patch within the current minor version (1.13.x) is compatible, thus maintaining flexibility for patch updates without necessitating major version changes. Such a strategy would align with semver principles and could facilitate easier maintenance and compatibility across projects.

aswin-s commented 10 months ago

@roryabraham Also not sure why we overlooked the removal of String.prototype.replaceAll. That amounts to 30KB (16% savings). Seems like that's more relevant than underscore since it is being shared with other packages.

https://bundlephobia.com/package/string.prototype.replaceall@1.0.8

kidroca commented 10 months ago

I agree with @aswin-s's suggestion regarding the removal of the String.prototype.replaceAll polyfill. This change is straightforward and indeed offers a significant reduction in the bundle size

Removing the polyfill impacts the backward compatibility and potential technical debt for legacy systems. It might be beneficial to conduct a quick survey or analysis to determine how many such legacy projects are dependent on expensify-common and their current Node.js versions.

I am generally in favor of removing the polyfill, as it primarily affects projects running on very old versions of Node.js (older than 15), which are likely to be a minority if any

roryabraham commented 10 months ago

@aswin-s sorry about that, I do agree with removing the replaceAll polyfill.

@aswin-s I'll co-assign this issue to you to open a PR to remove that polyfill, thanks.

melvin-bot[bot] commented 10 months ago

📣 @aswin-s 🎉 An offer has been automatically sent to your Upwork account for the Contributor role 🎉 Thanks for contributing to the Expensify app!

Offer link Upwork job Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review 🧑‍💻 Keep in mind: Code of Conduct | Contributing 📖

aswin-s commented 10 months ago

@roryabraham Not sure if this is the right place to ask, but since it also impacts the bundle size of ExpensiMark, I'll ask anyway.

We have a really large regex to match valid TLDs. Although it doesn’t significantly impact browser or desktop environments, it takes a considerable amount of time to execute on mobile devices. On Android devices, the TLD matching, which is part of markdown parsing, takes around 3 seconds to execute, thereby blocking the UI. The execution time increases exponentially with size of the matching text. I was wondering if there is a specific reason for having such stringent TLD matching, or if it's just been retained for historical reasons. Simplifying this to a shorter regex pattern that checks the validity of the TLD pattern, rather than checking explicit domains, would reduce the bundle size by around 10KB and significantly increase regex execution speed. I would be interested in hearing your thoughts on this.

Came across this while I was investigating performance issues in parsing markdown content in this issue.

roryabraham commented 10 months ago

@abekkala pulling you back in to help manage payments here. @kidroca invoices us separately but I think we may need to create and upwork job for @aswin-s

Edit: already done: https://github.com/Expensify/App/issues/31183#issuecomment-1890850472

aswin-s commented 10 months ago

PR was merged on Jan 15th. This one is ready for payment.

abekkala commented 10 months ago

@roryabraham I want to make sure I'm looking at the Correct PR. The one above has not merged to production yet, correct? The payment countdown starts after it's merged to production and there are no regressions.

@aswin-s The PR above https://github.com/Expensify/expensify-common/pull/632 is not showing merged to prod yet.

once that happens the payment title for this GH will auto-update with the payment date

aswin-s commented 10 months ago

@abekkala PR is merged to master in expensify-common repository. Melvin seems to be tracking PRs from Expensify/App repository only.

@roryabraham Could you please confirm?

roryabraham commented 10 months ago

Ah, sorry for the confusion here. @aswin-s can you please open an E/App PR to upgrade the expensify-common version we're using (otherwise the PR won't take effect). Then once that hit production we can issue payment.

fedirjh commented 10 months ago

This PR got superseded by https://github.com/Expensify/App/pull/34801. Expensify-commons has got upgraded to a more recent commit than the one mentioned in this PR.

It seems that expensify-common was upgraded in https://github.com/Expensify/App/pull/34801.

aswin-s commented 9 months ago

Ah, sorry for the confusion here. @aswin-s can you please open an E/App PR to upgrade the expensify-common version we're using (otherwise the PR won't take effect). Then once that hit production we can issue payment.

@abekkala @roryabraham Above PR has already hit PROD last week.

roryabraham commented 9 months ago

@abekkala can we please issue payment of $500 to @aswin-s for https://github.com/Expensify/expensify-common/pull/632 ?

abekkala commented 9 months ago

@roryabraham yes, I can send an upwork offer to @aswin-s Was there someone who did the PR review (that needs payment as well)? or was that done internally?

roryabraham commented 9 months ago

PRs for this issue were reviewed internally

abekkala commented 9 months ago

@aswin-s payment sent and contract ended - thank you!