Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.55k stars 2.9k forks source link

[Hold for payment 2021-08-17] [Performance] Optimize AsyncStorage on Android #2667

Closed arielgreen closed 3 years ago

arielgreen commented 3 years ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Problem

Though not carefully benchmarked, it's pretty obvious that the Android app is slow -- to start, to use, etc. And all things being equal, it would be better if it were faster. We don't know exactly what fraction of that time is spent reading/writing data from/to disk, but it's nonzero, and if it were faster, it would be a rising tide that floats all boats.

Solution

Do the following:

  1. Create a standalone RN app that benchmarks AsyncStorage. I think it should have three buttons, each of which outputs timing information to a log: a. Reset - Deletes the database on disk b. Write Sequential - Inserts a random number into keys 0 to 1M sequentially c. Read Sequential - (only active if there is a database) Read rows 0 to 1M sequentially
  2. Do some basic tests on both Android and iOS, and record timing from these scenarios in order: a. Cold write sequential - Reset the database, reboot the phone, then start the app and tap "Write Sequential". This will measure the ability to write to a "cold" disk that has nothing cached. b. Warm write - After doing "Cold write", tap "Write Sequential" again and see how overwriting the same database (presumably that is already cached) differs in performance. c. Warm read - After "Warm write", tap "Read Sequential" and see how reading a warmed-up file performs. d. Cold read - Reboot the phone, open the app, then tap "Read Sequential" again, to see how read performance differs when the file isn't cached
  3. Refactor AsyncStorage on Android to implement each of the suggested optimizations described here, and re-run the benchmark after each to see which helps and by how much, namely: a. Stored procedures (to avoid recompiling SQL every time) b. PRAGMA synchronous=OFF; c. PRAGMA mmap_size=(1010241024); // 10MB memory map d. Update the "Sequential Read" test to read keys in batches of 100, and re-test with the stock implementation (which just calls AsyncStorage.get() multiple times). Then implement a native version of AsyncStorage.multiGet() that makes a single query to SQLite, and re-test.
  4. Once we've found the winning combo (if any), submit to the React Native mainline to see if we can get them to merge it. However, you'll get paid whether or not they do.

Also, test a native version of AsyncStorage.mutliGet() that does it all in one query.

Upwork job post

View all open jobs on Upwork

MelvinBot commented 3 years ago

Triggered auto assignment to @tgolen (Exported), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

mateusbra commented 3 years ago

Proposal

I will make the step by step witth a standalone RN APP as mentioned and develop what was described here. In addition I will compare the benchmarks, analyze and obtain the best config to improve the performance result at maximum.

Finally I will implement the AsyncStorage.multiGet().

arielgreen commented 3 years ago

@mateusbra Please see our Contributing Guidelines for an explanation of what we're looking for in a proposal.

arielgreen commented 3 years ago

@kidroca is this something you'll be working on as part of #2762?

kidroca commented 3 years ago

@arielgreen It's planned to happen after #2762 and it's not part of that scope: https://expensify.slack.com/archives/C01GTK53T8Q/p1620172402387600?thread_ts=1620065578.311900&channel=C01GTK53T8Q&message_ts=1620172402.387600

Additionally, after this is done, @kidroca would you be willing to pick up that AsyncStorage optimization project? After all, we still need "cold" reads to be fast, and your timing shows that Android significantly slower than iOS, for (probably) easily solvable reasons.

arielgreen commented 3 years ago

@kidroca gotcha — thanks.

arielgreen commented 3 years ago

Job posting for this has expired. I'll repost to Upwork when we're ready to move forward with this one (I.e., when #2762 is complete).

arielgreen commented 3 years ago

@kidroca Are you ready to move forward with this? If so, I’ll recreate the Upwork posting and send over an invite.

quinthar commented 3 years ago

I don't think we should wait for @kidroca -- let's go ahead and get someone working on it asap! It would be great if it could be @kidroca, but it sounds like @mateusbra might also be interested (and probably a whole bunch more).

kidroca commented 3 years ago

@arielgreen

@kidroca Are you ready to move forward with this? If so, I’ll recreate the Upwork posting and send over an invite.

Yes, I am

Proposal

I've already worked on benchmarking and performance capturing logic. It can be reused even if we're not using Onyx for the timings that should be captured here My plan is to provide aggregated timing data the way I did here: https://github.com/Expensify/Expensify.cash/issues/2762#issuecomment-849945470

I will use a physical Android device - Samsung Galaxy S7 Edge Android 8.0 that tends to have a slow read speed at the moment

I'll work and submit a PR against E.cash fork of react-native: https://github.com/Expensify/react-native

One question, where should I push the "sample" app from step 1) Create a standalone RN app that benchmarks AsyncStorage

tgolen commented 3 years ago

Create a standalone RN app that benchmarks AsyncStorage

For this, I was thinking that it would live in a completely separate repo that you own (a free repo is fine). Would that work?

kidroca commented 3 years ago

For this, I was thinking that it would live in a completely separate repo that you own (a free repo is fine). Would that work?

Sure

arielgreen commented 3 years ago

Upwork posting here.

kidroca commented 3 years ago

Here's some benchmark information so far:

Setup

Before AsyncStorage updates

10, 000 Operations

image

50, 000 Operations

image

Questions

Do you still want me to do 1M tests, for an average of 16ms it would take about 4.5hrs to write 1M entries?

I think the fast speeds we see here are due to the fact that operations run in sequence and wait for each other, I've tried saving larger texts (1kb each) and the time is about the same (16.9ms per write, 8.3ms per read) The problem with E.cash being slow on Android seems to be that AsyncStorage does not handle parallel requests very well and E.cash/Onyx blast it with as many calls as they need. Maybe we should think something in this direction - optimize (or add) batching in AsyncStorage so that many parallel request does not degrade its performance. This could be alleviated as well by using multiGet in Onyx when that's possible, point 2 here: https://github.com/Expensify/react-native-onyx/issues/78

Do you want to make a different benchmark that tries to write using as many parallel requests as possible in an attempt to review the issue I've described above?

cc @quinthar @tgolen

kidroca commented 3 years ago

The relatively fast times got me to double check what versions of AsyncStorage are used, and I've discovered that E.cash is using a version of AsyncStorage that was abandoned by the community

E.cash and Onyx use: "@react-native-community/async-storage": "^1.11.0" which resolves to v1.12.1 the latest version you'll ever get from that repository https://github.com/Expensify/Expensify.cash/blob/854c9bf0420d4dd3036dc7610adf0c9cf0778521/package.json#L41

AsyncStorage was moved to "@react-native-async-storage/async-storage": "^1.15.5": https://github.com/react-native-async-storage/async-storage/discussions/507#discussioncomment-222363 - that's the version I've used for benchmarking as well

After a bit more investigation I've found this: AsyncStorage is moving away from the SQLiteOpenHelper and switches to Room - https://react-native-async-storage.github.io/async-storage/docs/advanced/next

The Room persistence library provides an abstraction layer over SQLite https://developer.android.com/training/data-storage/room

The latest version of AsyncStorage already has support for Room and it can be enabled following these steps: https://react-native-async-storage.github.io/async-storage/docs/advanced/next#enabling

Some additional point we should check with these discoveries:

I'll investigate the new source of AsyncStorage and try to answer some of the points above cc @quinthar @tgolen

Edit: Clarification the benchmarks above are not using Room. It has to be enabled explicitly

quinthar commented 3 years ago

Do you still want to contribute to AsyncStorage when it's not a part of @react-native-community

Wow, amazing find. Yes, let's not invest in abandoned code. Let's definitely redo the benchmark with the latest and greatest.

tgolen commented 3 years ago

Yeah, I agree. That's a good find, I had no idea it was abandoned. Let's first see if we can upgrade to the latest version, ensure there are no regressions, and then start benchmarking on that.

On Mon, Jul 5, 2021 at 1:13 PM David Barrett @.***> wrote:

Do you still want to contribute to AsyncStorage when it's not a part of @react-native-community https://github.com/react-native-community

Wow, amazing find. Yes, let's not invest in abandoned code. Let's definitely redo the benchmark with the latest and greatest.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Expensify/Expensify.cash/issues/2667#issuecomment-874289761, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJMAB3ROM73CCDNEHZ2KK3TWH765ANCNFSM44BATT7Q .

kidroca commented 3 years ago

I've updated my fork of Onyx with the latest AsyncStorage and did the same with E.cash I've runned with enabling next (Room) and without it, but I can't see any noticeable improvements

I'll post some before/after benchmark data of E.cash init

In the meantime do you want me to open a PR to update Onyx and E.cash with the new AsyncStorage repo? We can decide if we want next enabled or not after some more tests


I might have misled you with "abandoned", I think AsyncStorage is still a project maintained by the react-native community but it was moved out for the reasons they've discussed (on what to stay in/out)

tgolen commented 3 years ago

Yeah, go ahead and do a PR to update that lib. Even if it's not totally abandoned, I think we want to keep whichever is the most active and the most generally accepted as "official".

On Tue, Jul 6, 2021 at 11:19 AM Peter Velkov @.***> wrote:

I've updated my fork of Onyx with the latest AsyncStorage and did the same with E.cash I've runned with enabling next (Room) and without it, but I can't see any noticeable improvements

  • btw (next/Room) would affect Android only as it's just for that platform

I'll post some before/after benchmark data of E.cash init

In the meantime do you want me to open a PR to update Onyx and E.cash with the new AsyncStorage repo? We can decide if we want next enabled or not after some more tests

I might have misled you with "abandoned", I think AsyncStorage is still a project maintained by the react-native community but it was moved out for the reasons they've discussed (on what to stay in/out)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Expensify/Expensify.cash/issues/2667#issuecomment-874942062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJMAB2H5QTEE35YXXR27XTTWM3JLANCNFSM44BATT7Q .

kidroca commented 3 years ago

I've posted a comparison of Onyx.get timings per each AsyncStorage version: https://github.com/Expensify/react-native-onyx/pull/85#issuecomment-875604980

kidroca commented 3 years ago

I've made more benchmarks on the AsyncStorage benchmark app, but this time I copied some of my report actions to use as values I've used a large text item 149kb (250+ chat messages) and a smaller item of 9kb But again this didn't slow it down much when reading/writing sequentially

But reading the same items takes between 500 to 1000ms in E.cash Which points me in the direction that the many parallel calls happening are the real issue So I've setup new benchmark functions to test that

Write 50 items of 9kb in parallel (repeated 40-50 times)

Read 50 items of 9kb in parallel (repeated 40-50 times)

This is in line with what happens in E.cash where many requests happen at the same time https://docs.google.com/spreadsheets/d/1W9IoTdfGlqwK080HBv2T-kAs9LUadO_CB0PdgFG96OM/edit?usp=sharing

kidroca commented 3 years ago

Room is part of androidx: https://developer.android.com/jetpack/androidx

As such I think it's a solid choice and a nice improvement for AsyncStorage, furthermore even without JSI the benchmarks I've made prove that AsyncStorage can be fast, the bottleneck seems to be parallel handling and not database accessing

Here's how Room is addressing the proposed changes for this task: https://github.com/Expensify/react-native-onyx/issues/65#issuecomment-822046360

Upon opening the database, call SQLiteDatabase.compileStatement() to create a prepared statement with something like "SELECT value FROM catalyst WHERE key=?;" and save it for repeated use

This is covered in Room

react-native-async-storage/async-storage: StorageSupplier.kt

    @Transaction
    @Query("SELECT * FROM $TABLE_NAME WHERE `$COLUMN_KEY` IN (:keys)")
    suspend fun getValues(keys: List<String>): List<Entry>

Change AsyncStorage.getItemImpl() to call SQLiteProgram.bindString(0, key) on the prepared statement, and then SQLiteStatement.simpleQueryForString()

I think this is covered by the @Query annotation as well as per the docs: https://developer.android.com/reference/androidx/room/Query

Room will automatically bind the parameters of the method into the bind arguments. The arguments of the method will be bound to the bind arguments in the SQL statement. See SQLite's binding documentation for details of bind arguments in SQLite.

Do the same thing for setImpl(), which is currently using SQLiteDatabase.insertWithOnConflict(), which similarly appears to not use prepared statements and thus is unnecessarily compiling a separate SQL statement for each insert.

This item is covered by annotations as well

react-native-async-storage/async-storage: StorageSupplier.kt

    @Transaction
    @Insert(onConflict = OnConflictStrategy.REPLACE)
    suspend fun setValues(entries: List<Entry>)

Do a native implementation of AsyncStorage.multiGet() to select multiple keys in one query.

Also covered by the Room implementation see IN (:keys) in getValues above This adds a list of keys to be retrieved from the database getValues is used to retrieve multiple and single items as well


As per point 3) in the task description

a. Stored procedures (to avoid recompiling SQL every time) b. PRAGMA synchronous=OFF; c. PRAGMA mmap_size=(1010241024); // 10MB memory map

I think a) is covered by Room, I couldn't find a way to provide b) c) to the Room database


@quinthar @tgolen How should I proceed with this task?

tgolen commented 3 years ago

How should I proceed with this task?

Hm, I'm finding it a little hard to tell where you're at with everything. Maybe you could convert the original description into a checklist and then check off the ones that are done or can't be completed?

If I had to guess, it sounds like you're done with point 3, but haven't done point 4, yet. Is that right? And maybe the reason that point 4 isn't done is that we haven't found any winning combination? If we haven't found anything to improve android performance yet, then what is the next step for tracking it down? It's great that we're scratching things off the list that might be a source of the problem, so let's just keep throwing out ideas and narrowing them down.

kidroca commented 3 years ago
  1. Create a standalone RN app that benchmarks AsyncStorage. I've pushed the app here: https://github.com/kidroca/asyncStorageBench

  2. Do some basic tests on both Android and iOS, and record timing from these scenarios I've done that here: https://github.com/Expensify/Expensify.cash/issues/2667#issuecomment-874160913

  3. Refactor AsyncStorage on Android to implement each of the suggested optimizations After reviewing the code and the documentation I've described how most of the items are covered by Room https://github.com/Expensify/Expensify.cash/issues/2667#issuecomment-875887678

  4. Once we've found the winning combo (if any), submit to the React Native mainline to see if we can get them to merge it. I haven't made any changes to AsyncStorage that we can submit upstream, but I've run many different scenarios and benchmarks and I've put my observations here and in the related Onyx tickets

As a summary IMO the bottleneck seems to be parallel handling and not database accessing

quinthar commented 3 years ago

Wow, fantastic analysis! It's not what I expected, but that's why we test. Can you remind us what specifically you are proposing as a result of this analysis? I think it is:

  1. Continue upgrading AsyncStorage because it's what the community is moving towards, and is no slower

  2. Do ???something??? to fix the "parallel handling" problem. What specifically? Batch requests using the 2ms delay? Rewrite using JSI?

Thank you!

kidroca commented 3 years ago

Yes 1) we should use the new AsyncStorage with Room. This is only the first iteration and it might impove - the least at some point they'll remove the old code and deps

2) I've proposed changes in Onyx here https://github.com/Expensify/react-native-onyx/issues/78 that will use multiGet instead of foreach-ing which results in parallel requests. This might alleviate work enough so that the rest of the parallel calls are not an issue

We have benchmarking code in Onyx - it can be used to easily prove ideas like the above

quinthar commented 3 years ago

Fantastic, let's doo ittt!!

On Wed, Jul 7, 2021 at 3:23 PM Peter Velkov @.***> wrote:

Yes 1) we should use the new AsyncStorage with Room. This is only the first iteration and it might impove - the least at some point they'll remove the old code and deps

  1. I've proposed changes in Onyx here Expensify/react-native-onyx#78 https://github.com/Expensify/react-native-onyx/issues/78 that will use multiGet instead of foreach-ing which results in parallel requests. This might alleviate work enough so that the rest of the parallel calls are not an issue

We have benchmarking code in Onyx - it can be used to easily prove ideas like the above

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Expensify/Expensify.cash/issues/2667#issuecomment-875974040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEMNUVQHUZV6F6Y2CZLOB3TWTHUZANCNFSM44BATT7Q .

marcaaron commented 3 years ago

What's the current status of this issue? I'm having trouble telling which step we are on in the checklist above.

kidroca commented 3 years ago

I've listed what have been done here: https://github.com/Expensify/Expensify.cash/issues/2667#issuecomment-875936482

And now we're at

I've proposed changes in Onyx here More improvements for Onyx.connect (After cache) react-native-onyx#78 that will use multiGet instead of foreach-ing which results in parallel requests. This might alleviate work enough so that the rest of the parallel calls are not an issue

I've started working on that task, but discovered that though batching improves timings they are still far from 9-16ms for reads. Since then I've studied Onyx more and found some other places where it can improve and opened this PR #88

Before continuing with further work I would like to complete my contract for this assignment You've closed my PR so that I can open a few separate ones where we can test each change and merge stuff gradually to Onyx, I'll be happy to do that under a new milestone

arielgreen commented 3 years ago

@kidroca All set; approved and paid the original milestone and added an additional milestone. Please proceed.

kidroca commented 3 years ago

@arielgreen Thanks

I'm continuing with opening the first PR of the series I'll also open issues that are not yet on E.cash board

marcaaron commented 3 years ago

PRs related to capturing metrics have been merged

arielgreen commented 3 years ago

All good @MelvinBot

marcaaron commented 3 years ago

Just giving a brief update of where we are at with this. 2 more PRs have been merged:

  1. PR to improve metrics output
  2. Another PR to defer some Onyx.connect() calls while we wait for defaultKeysStates to be set

@kidroca I will create a PR to get that change into Expensify/App if you want to start working on the next PR which is

Prevent unnecessary writes when data has not changed by adding _.isEqual()

Sound good?

kidroca commented 3 years ago

@marcaaron Thanks, the 3rd PR was merged as well These 3 are ready:

  1. PR to improve metrics output
  2. Another PR to defer some Onyx.connect() calls while we wait for defaultKeysStates to be set
  3. Prevent unnecessary writes when data has not changed by adding _.isEqual()

Next is "LRU Cache update", I'll try to get the withLocalize change ready as well so we can see the full improvements: #4253

marcaaron commented 3 years ago

Nice, so, I've updated the open PR to include the latest changes and did another set of user timing tests on release builds.

The 2nd PR did not yield anything significant in terms of user time. But I did have to fix a broken test. No issues observed besides that.

The 3rd improved chat switch times by ~250ms on Android 🎉

MelvinBot commented 3 years ago

@tgolen, @kidroca, @arielgreen it looks like no one is assigned to work on this job. Please double the price or add a comment stating why the job isn't being doubled.

kidroca commented 3 years ago

I'm working on this. Will submit my last PR tomorrow.

arielgreen commented 3 years ago

Thanks @kidroca — ignore MelvinBot's comment above, we were testing something.

kidroca commented 3 years ago

The 4th of the planned PRs was merged: https://github.com/Expensify/react-native-onyx/pull/93 I'll open a PR to update Expensify/App on Monday

@arielgreen This would complete my current Upwork milestone. I think I can work on any approved future tasks under my hourly contract, we can keep this ticket open if necessary, though everything listed would have it's own issue

We've identified some additional work related to storage performance that's worth pursuing

  1. Onyx multi get and batching - https://github.com/Expensify/react-native-onyx/issues/84
  2. Onyx hydration - https://github.com/Expensify/App/issues/4433
  3. Try a JSI storage implementation - https://expensify.slack.com/archives/C01GTK53T8Q/p1628181643026200?thread_ts=1628106063.148700&cid=C01GTK53T8Q (Github ticket to be created)
  4. Aggregated OnyxContext to make withOnyx rendering faster - https://github.com/Expensify/App/issues/4101#issuecomment-889869578

@marcaaron 1 and 2 might become unneeded if the experiment on 3 is successful - JSI can be called directly - we can assign component state in the constructor. Same is true for browser's local storage, though we'll have to fill some of the functionality (like multi merge) that AsyncStorage provided

marcaaron commented 3 years ago

@kidroca I agree we should start investigating the JSI storage implementation. Probably the easiest way to prove the value is to patch Onyx to use MMKV and see if things improve then go from there.

kidroca commented 3 years ago

The 4th PR was merged here: https://github.com/Expensify/App/pull/4509

We've started work/discussions on the items proposed above https://github.com/Expensify/App/issues/2667#issuecomment-894533165

arielgreen commented 3 years ago

Excellent, thanks @kidroca. The 7-day period ends tomorrow for that one, so I'll pay out the milestone and close out that contract tomorrow.

@marcaaron since this work is continuing, is it helpful to leave this GH issue open as a master issue to track all these improvements going forward?

marcaaron commented 3 years ago

is it helpful to leave this GH issue open as a master issue to track all these improvements going forward

I think we can just close this one out.

We are looking into some improvements to the storage layer here and a few other issues tagged with [Performance] in the title.

arielgreen commented 3 years ago

Paid.