Improvements for Onyx connect

Expensify / react-native-onyx

Persistent, offline-first state management solution for React Native. Easy to use with minimal config and boilerplate.

MIT License

154 stars 71 forks source link

Improvements for Onyx connect #63

Closed kidroca closed 3 years ago

kidroca commented 3 years ago

Right now Onyx works fine on iOS On Android though not so much, we can blame it on AsyncStorage, but there are ways to improve the usage of AsyncStorage

Querying `AsyncStorage`

withOnyx receives a key -> val mapping
each key is connected separately to Onyx - this.connectMappingToOnyx(mapping, propertyName) -> new connection Onyx.connect
each key mapping can result in one or more matchingKeys
each key in matchingKeys makes a separate call to AsyncStorage.getItem get(key) -> AsyncStorage.getItem(key)
- BTW any exception in AsyncStorage.getItem is swallowed and end users of the Onyx library won't be able to intercept and handle errors themselves.

Now all of the above happens for a single withOnyx usage e.g. One component wrapped with Onyx. In the simplest scenario each prop requested from Onyx would make a separate call to AsyncStorage.getItem

So let's say 10 components use data from Onyx and each reads 2 props from Onyx. That's a total of 20 getItem calls. In practice the calls would be a lot more though and this happens during startup

What happens under the hood

every getItem call would make a new SQL query

The rest is just proving that statement - you can skip to Improvements if you trust me 😉

Async Storage for Android uses SQLite internally

Tracing this down you can see that every getItem call would make a new SQL query

Improving

Using `multiGet`

The easiest improvement would be to use the optimization provider from AsyncStorage - AsyncStorage.multiGet

At least for the code here: https://github.com/Expensify/react-native-onyx/blob/121c521fe72f37c2b1fb79c39755814558eca27c/lib/Onyx.js#L320-L324

instead of creating separate promises fetch all keys using multiGet

Batch connect calls keys

The next best thing: Don't call AsyncStorage as soon as Onyx.connect is called. Wait for a brief time or debounce the execution of AsyncStorage to run after all the calls to Onyx.connect. Make a batch and each time Onyx.connect is called add to that batch, after the debounce period expires (lets say 5ms) fetch all the keys using multiGet and respond to the listeners.

When all components are initially mounting (App Startup) this should batch all the necessary keys and make just one query to AsyncStorage. It would also remove duplicate queries when multiple components use the same Onyx KEY

In Memory Proxy

The next next best thing: Don't always read from file storage, make an "in memory proxy" and provide recently accessed keys Let's say there are 5 components subscribing to CURRENTLY_VIEWED_REPORTID each of them would retrieve the value from file storage. Instead only the initial call should read from file and return the value to the rest

Note on Onyx.set

Onyx.set does not resolve optimistically, it would first wait to perist to file so Onyx set would not imeddiately update listeners, but only after write, if it's successful

AsyncStorage.setItem -> keyChanged(key, val)

You could make the change immediately. If it happens to fail you can retry in the background or handle it some other way E.g. the value of the CURRENTLY_VIEWED_REPORTID updates let listeners know that right away no matter what

marcaaron commented 3 years ago

Awesome summary and issue! Thanks for this!

Heads up, I think we are trying to avoid a memory proxy for as long as we can. There have been concerns in the past about difficulty in keeping things in sync. But I have thought about this before in the past.

I'd be curious to see some basic benchmarks for before and after applying a multiGet() to connect(). The other solutions are interesting, but it's just hard to say which would have the largest impact without any benchmarks. Which are the most valuable to do?

kidroca commented 3 years ago

I'd bet on Onyx.set (Onyx.merge) when I thought of this a bit more

Let's imagine for a moment that all of the Onyx data is in memory and can be accessed instantly They are still persisted to file storage for obvious reasons

But due to the fact that Onyx.set would only inorm listeners after the change is written to file there would be an additional delay that could be skipped

@marcaaron we both had the chance to see the report ID coming much faster when it was provided from react navigation. waiting from the prop from onyx would only come in one of the later component updates:

Let simplify what was going on before the updates to the RecordsScreen:

We are in a chat
We open the sidebar and select a different chat by pressing a button
React navigation is immediately informed and in turn informs Onyx of the change
But here's the Onyx update path: updateCurrentlyViewedReportID -> Onyx.merge -> get -> set -> AsyncStorage.setItem -> then(() => keyChanged(key, val))
The Sidebar closes but we still see the previous report
But if we check the route.params we can see that the new route is there
On my physical S7 and Emulated Pixel 2 this would take between 0.5s to 1s (with the dev build though)

So initially I thought that set would have been called directly, but now I see that there's also a get -> AsyncStorage.getItem that needs to be waited too...

kidroca commented 3 years ago

Heads up, I think we are trying to avoid a memory proxy for as long as we can. There have been concerns in the past about difficulty in keeping things in sync. But I have thought about this before in the past.

You are keeping things in sync anyway. Data from the online backend needs to be synced and rendered on the device. Whether you keep it in a local file/database or in memory (that happens to be saved to disk as well) is prone to the same out of sync issues

E.g. Remote Data -> App State -> Rendered content

⏬ -------------------- ⏫ Local Data Layer -> App State -> Rendered content ⏫---------------------⏬ <--- Remote Data <---

It doesn't matter whether Local Data Layer is in memory or not it still needs to be in sync. If it works correctly you can always persist it using different strategies

brute - dump everything under a single key on every action
fine grained - patches between states can review which keys did change and only save those - this should be relatively easy for non cyclic json data

kidroca commented 3 years ago

Regarding benchmarks I could do the following:

I can open a branch (onyx fork) and apply some metric loggings/collecting without changing anything Then another branch that would just override the connect method to capture a batch of keys and use multiGet. It would be a dirty patchwork just to try to prove the point. Also I'll swap the set method to skip on waiting

Then we can use the current expensify.cash app to benchmark things by pointing it to the forked package of Onyx

tgolen commented 3 years ago

Hi! Coming in here to weigh in on this issue. My biggest concern is that there is no defined problem. The only problem that looks to be defined here is:

On Android though not so much

This is very ambiguous and means there are lots of solutions and ideas being discussed which have no clear value and is completely unknown if they solve this problem. I think all the improvements make sense theoretically, but we should only make changes when there is a clear problem.

this would take between 0.5s to 1s

Maybe this is the real problem? Let's find out if it is. I like the idea of doing some benchmarking on different platforms to see what the problem really is first before skipping to discussing possible solutions.

kidroca commented 3 years ago

Ok I think I see what's going on here. For some of you this comes out of the blue and I could have put more info in the ticket. I just assumed it was a problem already known to some extent

I've pinpointed this 0.5 - 1s. delay when I was working on https://github.com/Expensify/Expensify.cash/pull/2221#issuecomment-816791877

See what a huge delay there was when you switch to a different conversation

I'm in a chat (Concierge)
I go back to LHN
I tap a different chat. This internally would tell Onyx to update the key of the report ID I've pressed
The drawer closes
The old chat is still visible
Some time passes. The component receives new props from Onyx. We see the report ID changed and show a loader. You can see this happens way after the interaction

https://user-images.githubusercontent.com/12156624/114209874-ccb49500-9967-11eb-9c4a-06c1bc20b47a.mp4

Luckily we can get the report ID from the navigation and this happens instantly between steps (3-4) above. And this is how the issue was solved.

At that time I've added a timestamp on each prop change - the change coming from onyx would come about 0.5 - 1s after the change coming from navigation route props. This was way out of scope so I just posted a comment about it. I've also brought it up in other discussions too: https://expensify.slack.com/archives/C01GTK53T8Q/p1617734370399900

I can't track this down ATM, but it was mentioned there were startup issues on Android where you'd see a blank screen for a long time and then suddenly the App will pickup. I think this is related, but I haven't made tests as it would have taken me a lot of time. The App seems to start a lot slower on Android. Tracing the Onyx.connect code was quicker.

Initially I thought there was delay due to reading data, well there certainly is, but the above issue was actually a result of waiting to write data and then informing listeners of the change. And then it turns out merge needs to first read data from file before writing it back down

kidroca commented 3 years ago

Keeping a dictionary (a store) in memory would definitely make updates faster, but I don't think it's would be a responsibility of AsyncStorage. There would be many cases where this would be an unnecessary overhead:

people using redux won't need this
people using Apollo with Apollo Cache either
mobx nope

This means it probably won't be built into AsyncStorage, it might be a wrapper that they or someone else provides

Fixing the Android performance inside AsyncStorage to match iOS might be a do over. Or it might simply be down to specs. Improving how Onyx works seems easier

quinthar commented 3 years ago

FYI, added some discussion here -- these feel like sorta the same discussion, so perhaps we should merge these two issues?

https://github.com/Expensify/react-native-onyx/issues/65

quinthar commented 3 years ago

In particular I want to cross-post this comment:

The nice thing about memory mapping SQLite is it provides us a RAM cache for free, using the virtual memory subsystem of Linux. So we wouldn't need to actually build a RAM cache into Onyx, because recently used sqlite pages would already be in RAM.

kidroca commented 3 years ago

I've addressed the Benchmark reasons in the other thread. I'll add some details into the performance side here

SQLite is only for Android, AsyncStorage in iOS works differently
Bundling this inside AsyncStorage, would mean it should be configurable, opt in, otherwise people are forced to use it and when they already have an in-memory cache like redux or apollo, they are paying the same price twice.
- Async Storage have a different plan for v2. They provide an interface which anyone could implement a native solution for
The stored amount in MB will be the same no matter at which level caching is handled, but when caching is implemented in JS the JS thread have direct access to it, while if it's in the native level access would always have to be synchronized through the native bridge and would cost more total RAM.

Let's imagine a scenario like retrieving session information (auth token, user email, etc)

When caching is on the native level and particularly the SQLite level:

Let say we have session in cache
JS code asks for session
Onyx makes a call to native code
Native code builds a query (the query size could already be bigger than the result we're after)
Opens a db connection and executes query
Db mechanisms resolve to a cached content
Native call returns content to Onyx
Onyx returns content to caller

All these layers would cost extra memory and processing. It would not be worth it as a caching solution for a key/value that takes less than a kilobyte to store in JS

When caching is on the JS level and particularly in Onyx

Let say we have session in cache
JS code asks for session
Onyx checks a local dictionary of keys
Onyx returns value

So addressing cache as early as possible has the most benefits - it again saves executing unnecessary operations, while the data would take the same amount of space.

You could also compress the string with up to 5:1 compression ratio. Though when you render it on screen it would take the full size + the zipped size where it originally came from

quinthar commented 3 years ago

Bundling this inside AsyncStorage, would mean it should be configurable, opt in, otherwise people are forced to use it and when they already have an in-memory cache like redux or apollo, they are paying the same price twice.

To clarify, AsyncStorage already uses SQLite on Android, but its implementation is very inefficient. I'm just suggesting we contribute an optimization to AsynchStorage on Android to:

Use prepared statements
Use memory mapping
Implement multiGet() with a single query

It wouldn't involve "bundling" SQLite, just changing how AsyncStorage currently uses it. It wouldn't change the external interface of AsyncStorage, it would just hopefully make it faster and more efficient for everyone. Does that make sense? can you see a reason why we wouldn't do this optimization with AsyncStorage, whether or not we add a cache inside of Onyx?

kidroca commented 3 years ago

I know SQLite is used internally and won't be bundled.

My points are regarding caching (memory mapping) and still stand:

"SQLite is only for Android..." - You would be providing a benefit only for Android users. That's OK I guess, since they are the most affected
"Bundling this inside AsyncStorage..." - Adding this feature directly inside AsyncStorage. This would probably be an overhead for most of the users - they don't use Async Storage as App State management solution - they are fine making a few slow reads from file and then hidrate a redux store or their own state solution. A RAM cache inside AsyncStorage will be memory they've allocated (when they've read initial data) but won't use - it should be optional if at all bundled in AsyncStorage. But this is only my opinion. A feature request can be added on the community repo that would prove or disprove this point.
"The stored amount in MB will be the same..." - In react native world for a simple key/value in-memory cache the best implementation would be on the JS thread, because of how the system works. Native implementation would always cost more RAM even though the cached (stored) amount of data is the same

None of the storage solutions neither in iOS or on Android implement any kind of cache themselves and this might be related to point 3

Regarding "Use prepared statements" and "Implement multiGet() with a single query" this makes sense to me and probably deserves a pursue, but I do think Onyx would benefit much more with it's own caching, so much that these would be negligible

Regarding multiGet I've found some resources that do suggest multiple small queries can be better or no overhead:

This one is official coming from sqlite.org: "Many Small Queries Are Efficient In SQLite": https://sqlite.org/np1queryprob.html

200 SQL statements per webpage is excessive for client/server database engines like MySQL, PostgreSQL, or SQL Server. But with SQLite, 200 or more SQL statement per webpage is not a problem.

The latency of a single SQL query is far less in SQLite. Hence, using a large number of queries with SQLite is not the problem.

Now this might not be entirely true on a mobile device, but it can be researched further

You'd think that I work for AsyncStorage but I don't 😉

quinthar commented 3 years ago

"Bundling this inside AsyncStorage..." - Adding this feature directly inside AsyncStorage. This would probably be an overhead for most of the users - they don't use Async Storage as App State management solution - they are fine making a few slow reads from file and then hidrate a redux store or their own state solution.

To clarify, I'm not suggesting adding overhead to AsyncStorage; I'm suggesting we optimize it to be faster for all users. I don't see a single downsize for any use case of using prepared statements; it's the "right way" to use SQLite for this kind of thing, and AsyncStorage just isn't doing it right. Nobody benefits by AsyncStorage being slow, even if I agree that React Native is just not widely used, and those who do use it aren't performance sensitive.

A RAM cache inside AsyncStorage will be memory they've allocated (when they've read initial data) but won't use - it should be optional if at all bundled in AsyncStorage. But this is only my opinion. A feature request can be added on the community repo that would prove or disprove this point.

To clarify, this isn't adding a RAM cache -- it's just enabling AsyncStorage to use the existing file cache as a RAM cache. Without memory mapping, sqlite is forced to make a bunch more copies of data; by enabling memory mapping, it just reduces the number of copies required. It doesn't increase the amount of memory used, it reduces the amount of memory used. I'm not aware of a single disadvantage to anybody by using memory mapping. If it were already written this way, I can't think of any scenario in which you'd actively choose to disable the memory map. Again, nobody actively benefits by AsyncStorage being slow.

That said, the way you enable the memory map is by executing an SQL PRAGMA command, so we could leave it disabled by default and have applications enable it.

kidroca commented 3 years ago

Hey, sorry I can't invest any more time for free into this.

kidroca commented 3 years ago

I had a chance to dig a bit more the iOS side of things. We've focused a lot on Android, but it turns out iOS is faster simply because it already uses in memory cache.

The iOS storage implementation is a lot simpler than Android. Everything is stored in a text file and read as string as needed. To improve performance a 2MB cache is used:

RCTGetCache

static NSCache *RCTGetCache()
{
    // We want all instances to share the same cache since they will be reading/writing the same
    // files.
    static NSCache *cache;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
      cache = [NSCache new];
      cache.totalCostLimit = 2 * 1024 * 1024;  // 2MB
    });

    return cache;
}

https://github.com/react-native-async-storage/async-storage/blob/8f19c1b49cb63b9bf94558c8c81779ac024bf465/ios/RNCAsyncStorage.m#L219-L241

The when a key is read if there's a value in cache it's directly returned

_getValueForKey

- (NSString *)_getValueForKey:(NSString *)key errorOut:(NSDictionary **)errorOut
{
    NSString *value =
        _manifest[key];  // nil means missing, null means there may be a data file, else: NSString
    if (value == (id)kCFNull) {
        // read from cache first
        value = [RCTGetCache() objectForKey:key];
        if (!value) { // No cached value -> read from file
            NSString *filePath = [self _filePathForKey:key];
            value = RCTReadFile(filePath, key, errorOut);
        }
    }

    return value;
}

https://github.com/react-native-async-storage/async-storage/blob/8f19c1b49cb63b9bf94558c8c81779ac024bf465/ios/RNCAsyncStorage.m#L591-L610

Otherwise data is read from file

RCTReadFile

https://github.com/react-native-async-storage/async-storage/blob/8f19c1b49cb63b9bf94558c8c81779ac024bf465/ios/RNCAsyncStorage.m#L82-L110

Basically iOS is doing what I suggested for Onyx, with the difference that when caching is done in the Onyx layer - values won't be cached as strings, and there would be no overhead of constantly parsing strings to objects, which is costing more execution time and more memory - once for the raw string value and once for the string parsed to object.

Also there's a fixed limit of 2MB, though it should be plenty most of the time, there might be a slowdown if Onyx starts using more than 2MB of memory

The same solution can be implemented for Android, if your interested in addressing it in that level. Add a 2MB "rolling" im-memory cache which would be simpler and resolve earlier than relying on memory mapping and handling this in the SQLite level. E.g. enabling mmap and other improvements could cost more than a simple 2MB in memory cache

As I've already said the faster (for react-native) would be to handle this on the JS level and prevent any expensive operations right from the start - crossing the bridge, involve deeper native operations, stringification...

I'm surprised AsyncStorage haven't thought of that themselves, or maybe they did and have a reason not to implement the cache on the JS level where it can be reused by both iOS and Android.

Would you like to open a ticket on that matter on their repo?

quinthar commented 3 years ago

Wow, very interesting find, thanks! That's pretty compelling. Some questions:

If the AsyncStorage implementation on iOS has an in-memory cache of recently accessed keys, would you agree that the Android and web versions should as well, to be consistent? After all, the whole point of React Native is to have consistent performance across all platforms, so it's very odd to have wildly different implementations.
Regarding the value of an Onyx cache, to what degree do you feel it's because:
1. Onyx.connect() is often called with the same key simultaneously many times?
2. Onyx.connect() is called back to back frequently for the same key, but only after previous connections were disconnected?

kidroca commented 3 years ago

I agree with 1) though it might be more cost effective to just address this in JS

covers both iOS and Android and any future native platforms
does not have to cross the native bridge for cached usages

This could still be a part of AsyncStorage, and it would be nice if it can be configurable - use / don't use caching Otherwise if cache is on the native level all those separate get key calls would still flood the native bridge

Regarding 2) it's primarily due to i. "is often called with the same key simultaneously many times", but there are other reasons too

Making many key retrievals at once (mostly during init, but not solely there) instead of get all, get from cache or batching
Always have to parse retrieved data - even when we just did - more CPU time and memory. Cache in Onyx means you have a direct referential link to an object instead of a (new) serialized string. Cache in Async storage means you're still dealing with strings first
- strings in JS are always new object - new memory allocation
- objects created from string JSON.parse are always new objects - new memory allocation
Onyx code becomes simpler - for example merge we don't have to wait to retrieve the value and then merge data to it and save it again. Every write operation updates app state instantly and then a transaction is just queued to update storage. Since writes are queued you can save even more native bridge crossings or at least optimize so this crossing happens during an idle time, but you cannot have a write queue without cache in Onyx

tgolen commented 3 years ago

To me, this sounds like two completely separate issues.

I would love for us to contribute to AsyncStorage to get near-identical performance across all platforms (after all, it's SUPPOSED to be cross-platform). I also don't think the caching should be configurable (because it's not configurable now with iOS and it seems to be fine).

We can still make optimizations with Onyx if we want to (when it becomes a problem).

While I agree that it is more cost-effective in the short-term to just address this by updating Onyx to overcome deficiencies in AsyncStorage, I don't think that is a good enough rationale for putting a more long-term and proper fix in place. Also, the cost isn't an issue for us as we will pay for the updates to AsyncStorage.

On Thu, Apr 29, 2021 at 9:08 AM Peter Velkov @.***> wrote:

I agree with 1) though it might be more cost effective to just address this in JS

covers both iOS and Android and any future native platforms

does not have to cross the native bridge for cached usages

This could still be a part of AsyncStorage, and it would be nice if it can be configurable - use / don't use caching Otherwise if cache is on the native level all those separate get key calls would still flood the native bridge

Regarding 2) it's primarily due to i. "is often called with the same key simultaneously many times", but there are other reasons too

Making many key retrievals at once (mostly during init, but not solely there) instead of get all, get from cache or batching

Always have to parse retrieved data - even when we just did - more CPU time and memory. Cache in Onyx means you have a direct referential link to an object instead of a serialized string. Cache in Async storage means you're still dealing with strings first

Onyx code becomes simpler - for example merge we don't have to wait to retrieve the value and then merge data to it and save it again. Every write operation updates app state instantly and then a transaction is just queued to update storage. Since writes are queued you can save even more native bridge crossings or at least optimize so this crossing happens during an idle time, but you cannot have a write queue without cache in Onyx

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Expensify/react-native-onyx/issues/63#issuecomment-829317426, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJMABZWAV7MTB6JG7N5WADTLFY53ANCNFSM425MZY6A .

kidroca commented 3 years ago

I thought it was clear that this can still be addressed in the JS level not in Onyx but in AsyncStorage

covers both iOS and Android and any future native platforms
does not have to cross the native bridge for cached usages

This could still be a part of AsyncStorage

I also don't think the caching should be configurable (because it's not configurable now with iOS and it seems to be fine)

My reasoning is that people that are having their own memory cache like with redux and Apollo should be able to opt out of this

In slack the suggested cache size is 10MB. I want to be able to decide whether I need this or not
current cache max size in iOS is 2MB. I'm willing to take the hit for such a small size.

tgolen commented 3 years ago

Ah, OK. Thanks for clarifying those points.

On Thu, Apr 29, 2021 at 11:40 AM Peter Velkov @.***> wrote:

I thought it was clear that this can still be addressed in the JS level not in Onyx but in AsyncStorage

covers both iOS and Android and any future native platforms

does not have to cross the native bridge for cached usages

This could still be a part of AsyncStorage

I also don't think the caching should be configurable (because it's not configurable now with iOS and it seems to be fine)

My reasoning is that people that are having their own memory cache like with redux and Apollo should be able to opt out of this

In slack the suggested cache size is 10MB. I want to be able to decide whether I need this or not

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Expensify/react-native-onyx/issues/63#issuecomment-829459227, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJMAB5SVXIWQBCWB2UNPVLTLGK2VANCNFSM425MZY6A .

quinthar commented 3 years ago

@kidroca given this:

Regarding 2) it's primarily due to i. "is often called with the same key simultaneously many times",

Don't we already have an in-memory cache of this data then, in the form of the dom? It would seem that if we want to avoid re-reading the same value that is already connected from this, we should simply detect when someone tries to connect to a key that is already connected to, and then copy the state object that we already have in memory. We already have an in-memory cache in onyx, and it's called the dom. Can't we just reuse that?

kidroca commented 3 years ago

Don't we already have an in-memory cache of this data then

No, you just have it in memory. Just because you have something committed to memory it doesn't mean you have in-memory cache. You still need a way to retrieve and reuse it, implement this retrieval and you'll end up with something like in-memory cache

We already have an in-memory cache in onyx, and it's called the dom. Can't we just reuse that?

I guess by DOM you mean the React Virtual DOM. That's private and even if you find a way to access it it's subject to change without notice.

So the next best thing is to store a reference to stuff you put in that VDOM. I think that's what you mean by

we should simply detect when someone tries to connect to a key that is already connected to, and then copy the state object that we already have in memory.

And how do you suggest to mange which keys are already connected to? Perhaps put them in a Dictionary/Map - you see where this is going?

Right now you don't have a single state object in memory, but each individual connection/component with it's own private state. You can try to find one of the connections for the key you're interested and retrieve the value, but that's messy so you'll need to implement a place that will be the single source of truth

Remember the Dictionary from above, what if it stored the most recent value for connected keys -> now you have cache

and then copy the state object that we already have

Do you mean "copy" as create a duplicate each time a key is retrieved instead of reuse through referential value? Maybe you think you'll be avoiding bugs by this smart decision, but you can just disable external mutation to your single source of truth for the same effect

kidroca commented 3 years ago

We can still make optimizations with Onyx if we want to (when it becomes a problem).

I agree with that if it was only about making Onyx 5% or 10% faster, but what I'm talking about are improvements both in simplicity and efficiency that would make Onyx a 100 times faster no matter how the underlying storage layer works (AsyncStorage). Just by refactoring get and set

kidroca commented 3 years ago

In memory cache is simple to implement (and still takes the same amount of memory you already use, as pointed out by @quinthar )

You can still search for a way around it, but in the meantime see how simple it would be to update Onyx:

const cache = new Map();
const keyRetrievalTasks = new Map();
const writeQueue = [];

/**
* Returns a Promise that resolves with the requested key value
*/
function get(key) {
  const value = cache.get(key);
  // When we have cache quickly resolve with that
  if (value) return Promise.resolve(value);

  // When we're already retriving the data from disk, hook to the same call
  if (keyRetrievalTasks.has(key)) return keyRetrievalTasks.get(key);

  // Capture a promise so others seeking the same key can get it at the same time
  const task = getFromStorage(key);
  keyRetrievalTasks.set(key, task);

  task
    // Fill cache after reading from disk
    .then(data => cache.set(key, data))
    // Remove the task after it completes
    .finally(() => keyRetrievalTasks.remove(key));

  return task;
}

getFromStorage would be the existing Onyx.get function

The cache logic is just this:

  const value = cache.get(key);
  // When we have cache quickly resolve with that
  if (value) return Promise.resolve(value);
  // ...
  cache.set(key, data);

The rest is an optimization for when the same key is retrieved multiple times while already reading from disk - something that needs addressing in any case

Could you implement the same reading data from the React tree/state? Probably, but:

would it be as simple?
would it be as efficient?
would it be better in any way?

I won't delve into the set refactor right now - it's something on the same scale of simplicity and efficiency, but I'll say this: Writes can be scheduled to happen at the best possible time in the nearest second, something you cannot allow to do right now as merge waits for set to complete the write on file, before notifying listeners. While with cache - cache write happens instantly execution continues, listeners are notified, a write to file is scheduled to happen shortly. This opens the door for other optimizations as grouping writes and discarding outdated writes.

kidroca commented 3 years ago

I have made some extensive tests capturing only the performance of Onyx.get

Current Onyx.get performance

Tested on Emulators and Physical devices

Sample for switching chats on a Physical Android device

(index)	min	max	avg	total calls

network	"164.97"	"1718.92"	"1502.71"	65
personalDetails	"76.03"	"1451.29"	"1443.93"	52
modal	"26.36"	"967.13"	"139.89"	8
report_71955477	"75.75"	"1563.90"	"995.12"	5
session	"20.20"	"830.69"	"250.73"	5
reportActions_71955477	"92.33"	"1519.27"	"805.80"	2
currentlyViewedReportID	"149.11"	"149.11"	"149.11"	1
reportUserIsTyping_71955477	"832.98"	"832.98"	"832.98"	1

Shows all the get call that happen when you switch to a different chat. Times are in milliseconds ms
S7 Galaxy Edge running Andriod 8.0 (2016)
Full results:
- Download xls
- Google Drive: https://docs.google.com/spreadsheets/d/15GkUEUOyVLk02y8_-PFAJFF4Ej7OsyXfdbyJgFoC_4Q/edit?usp=sharing

Onyx.get performance after suggested changes

Just the modification to get I suggested above

The same test as above

(index)	min	max	avg	total calls

network	"0.60"	"5.70"	"2.36"	62
personalDetails	"2.59"	"5.71"	"3.06"	49
modal	"0.07"	"6.00"	"1.26"	6
report_71955477	"0.64"	"5.44"	"2.31"	5
reportActions_71955477	"0.63"	"5.46"	"3.04"	2
session	"4.89"	"24.17"	"14.53"	2
currentlyViewedReportID	"3.63"	"3.63"	"3.63"	1
reportUserIsTyping_71955477	"0.71"	"0.71"	"0.71"	1

Shows all the get call that happen when you switch to a different chat. Times are in milliseconds ms
S7 Galaxy Edge running Andriod 8.0 (2016)
Full results:
- Download xls
- Google Drive: https://docs.google.com/spreadsheets/d/1ybxEBalWqJ9JZHiclnvCkV-SZfejwYfU7KAfHG_YTxE/edit?usp=sharing

The changes I've made to capture these metrics

Changes to `Onyx.get`

let timings = [];

function get(key) {
    const now = performance.now();

    return AsyncStorage.getItem(key)
        .then(val => JSON.parse(val))
        .catch(err => logInfo(`Unable to get item from persistent storage. Key: ${key} Error: ${err}`))
        .finally(() => timings.push({ key, operation: 'get', startTime: now, duration: performance.now() - now }))
}

const Onyx = {
    /* ... */
    getTimes() { return timings.slice() },
    resetTimes() { timings = []; }
};

Changes to `SidebarScreens`

    toggleCreateMenu() {
        this.setState(state => ({
            isCreateMenuActive: !state.isCreateMenuActive,
        }));

        const times = Onyx.getTimes();

        const aggregated = _.chain(times)
            .groupBy('key')
            .mapObject((values = [], key) => {
                const durations = values.map(v => v.duration);

                return ({
                    min: _.min(durations).toFixed(2),
                    max: _.max(durations).toFixed(2),
                    avg: (lodashSum(durations) / values.length).toFixed(2),
                    calls: values.length,
                });
            })
            .value();

        Onyx.resetTimes();

        console.table(aggregated);
        console.table(times);
    }

Instead of printing a message for each call I fill a list of metadata Once enough data is collected I open the LHN and press the FAB This logs the collected data and restes timings

Preferably this can be extracted to a in-app dev page and instead of logging - data can be exported/shared or send to a statistics endpoint

I've proposed a way to apply the above instrumentation to all the Onyx methods you need analyzed, without changing the original source

quinthar commented 3 years ago

Wow!!! Those are some insane results!

kidroca commented 3 years ago

What's suggested in slack is a similar improvement.

But there are some complications you might run into with that approach

Keeping a map of connections by key

This is fine, though each key will hold a list of connections. The instances in the list would all have the same value, they are just different connections to the same key by different components. So you just need the first element of that list...

Retrieving the value from `withOnyxInstance.state`

This is the real problem with this solution. state is a component state. When you use it inside the get() method you have no guarantee something is not about to call setState and update it. So you'll need to cover this with additional logic that cannot be put inside the get() method.

by storing a ref as early as possible my solution always have access to the latest value without extra effort

Multiple calls for the same value would still initiate multiple disk reads for the same key

When no existing connection has the needed value like during init, and multiple components are requesting this value, several separate disk reads are initiated, they might happen in parallel, but they are still causing n times bridge flooding at the most intensive time for the app

tgolen commented 3 years ago

Peter, thank you for all this data and information. It's going to be super helpful as we decide the best path forward!

On Tue, May 4, 2021 at 9:10 AM Peter Velkov @.***> wrote:

What's suggested in slack https://expensify.slack.com/archives/C01GTK53T8Q/p1620065578311900 is a similar improvement.

But there are some complications you might run into with that approach Keeping a map of connections by key

This is fine, though each key will hold a list of connections. The instances in the list would all have the same value, they are just different connections to the same key by different components. So you just need the first element of that list... Retrieving the value from withOnyxInstance.state

This is the real problem with this solution. state is a component state. When you use it inside the get() method you have no guarantee something is not about to call setState and update it. So you'll need to cover this with additional logic that cannot be put inside the get() method.

by storing a ref as early as possible my solution always have access to the latest value without extra effort

Multiple calls for the same value would still initiate multiple disk reads for the same key

When no existing connection has the needed value like during init, and multiple components are requesting this value, several separate disk reads are initiated, they might happen in parallel, but they are still causing n times bridge flooding at the most intensive time for the app

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Expensify/react-native-onyx/issues/63#issuecomment-832018505, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJMAB3R5LLG64Z3IQEVWV3TMAE65ANCNFSM425MZY6A .

Expensify / react-native-onyx

Improvements for Onyx connect #63

Querying AsyncStorage

What happens under the hood

Improving

Using multiGet

Batch connect calls keys

In Memory Proxy

Note on Onyx.set

RCTGetCache

_getValueForKey

RCTReadFile

Current Onyx.get performance

Onyx.get performance after suggested changes

The changes I've made to capture these metrics

Changes to Onyx.get

Changes to SidebarScreens

Keeping a map of connections by key

Retrieving the value from withOnyxInstance.state

Multiple calls for the same value would still initiate multiple disk reads for the same key

Querying `AsyncStorage`

Using `multiGet`

Changes to `Onyx.get`

Changes to `SidebarScreens`

Retrieving the value from `withOnyxInstance.state`