Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.56k stars 2.9k forks source link

[Discussion|Onyx] Mixing Onyx.set and Onyx.merge usages leads to race condition #5930

Closed kidroca closed 3 years ago

kidroca commented 3 years ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Discussion

Decide how to deal with a race condition that happens when mixing Onyx.merge and Onyx.set

Problem

It's not obvious that we can introduce changes leading to Onyx.set and Onyx.merge being called close in time with the same key and create a race condition.

Details

During one of our PRs we discovered an issue where storage is getting overwritten with an older value due to using Onyx.set and Onyx.merge to update the same key: https://github.com/Expensify/App/pull/5726#discussion_r729851899

Debugging revealed that Onyx.merge and Onyx.set are called very close in time: https://github.com/Expensify/App/pull/5726#issuecomment-944405977

The problematic code did this

  1. Invite button is pressed
  2. One piece of code calls Onyx.merge to clear policy errors
  3. At the same time another piece of code calls Onyx.set to update the same policy key
  4. The merge from 1) completes last and overwrites local storage with older value

Even though merge and set are called at the same time. merge has to first read the full policy object in order to merge the changes to it. merge starts and gets the value before set has updated it, since it's promise based it would continue on the next tick. In the meantime set saves a new value in storage (step 2). On the next tick (step 3) the merge is applied with the stale data and another call to set overwrites the storage

A debug session that captured the problem

https://user-images.githubusercontent.com/12156624/137510311-327be7bc-d2c2-499c-8513-5d9fc0fc8964.mp4

MelvinBot commented 3 years ago

Triggered auto assignment to @NicMendonca (AutoAssignerTriage), see https://stackoverflow.com/c/expensify/questions/4749 for more details.

kidroca commented 3 years ago

IMO to avoid this problem we should only allow one way to update Onyx. E.g. hide set from the interface and enforce updates to happen only through Onyx.merge.

But at the moment merge cannot cover all the cases (or maybe it's not clear how):

The policy code originally used Onyx.set to be consistent, you can see here that it still uses Onyx.set when it has to remove items from the policy:

https://github.com/Expensify/App/blob/f66dd56ae0f65f88db559300d99558a5a9cb7864/src/libs/actions/Policy.js#L267-L269 https://github.com/Expensify/App/blob/f66dd56ae0f65f88db559300d99558a5a9cb7864/src/libs/actions/Policy.js#L277

NicMendonca commented 3 years ago

I am not entirely sure how to triage this 😅 Adding the engineering label for more eyes.

MelvinBot commented 3 years ago

Triggered auto assignment to @bondydaa (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

parasharrajat commented 3 years ago

@NicMendonca This is a discussion issue and does not need to work on or exported.

bondydaa commented 3 years ago

per the philosophy in the readme https://github.com/Expensify/App/#Philosophy

Actions should favor using Onyx.merge() over Onyx.set() so that other values in an object aren't completely overwritten.

So why is code being merged that uses Onyx.set() when we should be using .merge()?

Or is the problem that even multiple Onyx.merge() calls to update the same key can create the same race condition?

marcaaron commented 3 years ago

This is a challenging one to address because it feels like a potential problem with Onyx, but also feels like a potential problem with our conventions of syncing data in New Expensify.

Under normal conditions these race conditions should not be happening and the batching of merge() calls protects us from any really bad effects.

But it does seem likely that when we are using a combination of set() and merge() in the same tick then the outcome would be hard to predict. So maybe we can start by looking at where we made those decisions and why?

It should be more clear what sort of action to take if we audit the workarounds that we suspect cause race conditions.

Enforcing that merge() be preferred over set() could be a great solution. But it's hard to say without considering why we sometimes prefer to use set() over merge(). One reason I've observed is that we are trying to remove some values and "reset" the data by using set() instead of merging new data into stale data. "syncing" local data with an API response is another reason you'll see this happen.

kidroca commented 3 years ago

IMO it would be best to expose only one method for updating Onyx that works similarly to merge, e.g.

export interface Onyx {
  update({ key: string, value: any, strategy: UpdateStrategy }): Promise<void>
  // or
  update(key: string, value: any, options: { strategy: UpdateStrategy }): Promise<void>
}

enum UpdateStrategy {
  DeepMerge,
  ShallowMerge,
  Overwrite,
}

strategy would have a default and it can be DeepMerge - this is the way the current merge works Overwrite would be the successor of set And ShallowMerge is for cases where we don't want to merge nested objects or arrays but replace them - this allows us to remove/replace inner arrays instead of always merging new items to them Furthermore to delete something we can have a special value constant exposed from onyx

Onyx.update({ key: ONYXKEYS.TOKEN, value: Onyx.DELETE_VALUE })

This allows us to have a queue of updates similar to how merge works with a merge queue, but having an information for the type of the update e.g. overwrite or deletion would allow us to disregard any previous values in the queue since if a latter update is a complete overwrite - prior queued updates have no effect on the end result

marcaaron commented 3 years ago

Nice, I like the idea of batching and having different strategies to apply changes in a synchronous way. I feel the public interface should maybe be simpler only because I'm not sure if anyone would stop to think about whether they need a deep or shallow merge 😄

It feels like Expensify/App has outgrown Onyx a bit and overtime we've run into situations where we need a more complex way to modify data.

Just as a thought experiment, what if we extended the public methods Onyx has to solve some of the problems we might run into when working with data and then add the internal queue to help prevent the race conditions?

Some things that have been floated in the past and in this issue...

We've discussed adding a callback to Onyx.set() but other priorities pulled me from that investigation. Another idea was to add something like Onyx.delete('key.property.nestedProperty').

I wonder if we could add something like Onyx.filter() or other methods for working with arrays. I find it unintuitive that we can push items into an array with Onyx.merge() but there are no ways to remove them.


Anyway, that conversation could go on for a while 😅 to refocus...

  1. Can we reproduce the race condition with a unit test?
  2. Do we agree that we should start by making sure that Onyx.set() and Onyx.merge() calls happen in the order that they are called?
kidroca commented 3 years ago

Just as a thought experiment, what if we extended the public methods Onyx has to solve some of the problems we might run into when working with data and then add the internal queue to help prevent the race conditions?

It's possible to go the other direction and expose a CRUD api We can keep set but make it clear/update the mergeQueue if a merge queue exists for the key we're setting. The same thing could work for deletions. The only reason we have a race condition is that one action uses a queue and can be applied with a delay, while the rest - set/remove they don't wait

I wonder if we could add something like Onyx.filter() or other methods for working with arrays. I find it unintuitive that we can push items into an array with Onyx.merge() but there are no ways to remove them.

The most flexible solution would be something as what you've brought up: callback to Onyx.set() but for .merge - merge already reads the stored value, so if

And then it's possible to do something like

Onyx.merge(ONYXKEYS.ARRAY_CONTAINING_KEY, (value) => {
  return {
    ...value,
    myNestedArray: value.myNestedArray.filter(entry => shouldKeepEntry(entry))
  }
})

I guess it makes sense to have this interface for both set and merge, but it would make set slowe and delay the write - currently set does not wait to read the value while merge does

No easy way to delete a specific object properties

The Onyx.DELETE_VALUE can help here

Onyx.merge(ONYXKEYS.ARRAY_CONTAINING_KEY, {
  // deletes the value, e.g. the whole underlying object,array,string etc...
  someKey: Onyx.DELETE_VALUE, 
  someNestedObj: {
    // deletes this nested key 
    someKey: Onyx.DELETE_VALUE,
  }
  // Deletes everything past the 2nd index, while prior items are merged
  someList: [{}, {}, Onyx.DELETE_VALUE],
})

Onyx.delete could work as well but I would prefer an interface like Onyx.delete(key, path) But I don't see how you can use it to delete multiple array entries at once - you can only delete single items like

Onyx.delete(myKey, ['property', 'nested']);
// or
Onyx.delete(myKey, ['property', 'nestedList', 2]) 
// though it's not obvious whether remaining indexes get shifted down

  1. Can we reproduce the race condition with a unit test?

This should be easy, I can post a PR

  1. Do we agree that we should start by making sure that Onyx.set() and Onyx.merge() calls happen in the order that they are called?

In short yes - no matter what we decide (multiple or single ways to update storage) we still need to make changes to the merge queue so that it takes into consideration overwrites that happened from updates like set or delete

We just need to decide what we want to achieve - there are 2 scenarios, and only one causes a race condition

Ok case - set is called first

  1. set is called and immediately stores the new value in cache
  2. merge is called and reads the latest value from cache

Bug case - set is called after merge

  1. merge is called and reads currently stored value
  2. set is called and updates currently stored value before merge does
  3. merge saves the merged result and overwrites whatever happened in step 2)

ATM I can think of 2 ways to handle the bug case

kidroca commented 3 years ago

Nice, I like the idea of batching and having different strategies to apply changes in a synchronous way. I feel the public interface should maybe be simpler only because I'm not sure if anyone would stop to think about whether they need a deep or shallow merge 😄

The public interface can be similar to current merge - Onyx.update(key, value) where we default to a deep merge, but now I see that it's not clear that we do a merge and the value can be just the delta... We can mention it in the method documentation

Instead of deep and shallow merge we can stick to just 2 strategies - merge that works like current Onyx.merge, and set that works like current Onyx.set. If not specified we use merge by default - this should be documented on the method. Now people will be inclined to use merge and would only opt in to use set when it's really necessary This includes everything I've said about Onyx.DELETE_VALUE and how to delete nested values And can be extended with even more control with a method callback method like:

Onyx.update(ONYXKEYS.ARRAY_CONTAINING_KEY, (value) => {
  return {
    ...value,
    myNestedArray: value.myNestedArray.filter(entry => shouldKeepEntry(entry))
  }
})

Showcase

Typical usage (merging)

Onyx.update(MY_KEY, myValue);

Removing a nested key

Onyx.update(MY_KEY, { keyToDelete: Onyx.DELETE_VALUE });

Removing the whole key

Onyx.update(MY_KEY, Onyx.DELETE_VALUE);

Overwriting with an API response

Onyx.update(MY_KEY, myValue, { strategy: Onyx.UpdateStrategy.SET });

Removing array items (IRL example)

 // If the operation failed, undo the optimistic addition 
 const policyDataWithoutLogin = _.clone(allPolicies[key]); 
// set data back to what it was (policy), tell Onyx to discard any entries remaining past the original list
 policyDataWithoutLogin.employeeList = [...policy.employeeList, Onyx.DELETE_VALUE];
marcaaron commented 3 years ago

so when set is called - clear the mergeQueue and abort the pending merge. Don't go with the pending merge since a set happens after hence it's more recent and it completely overwrites the key

Yeah I think we should keep things simple for now and do:

Once that is settled we'll have a better foundation to discuss other improvements - maybe by auditing the usages of set() and seeing whether we are using it as a "reset" or workaround where we set modified local data like in the Policy.js case.

marcaaron commented 3 years ago

Quick update here. I don't think there's really much else to do here just yet so I want to wrap up this discussion.

I experimented with a test and proved the race condition exists. However, I'm not sure there are any actual cases where we're even calling a set() after a merge() so I added logging for detecting if someone tries to do it as it will currently work unexpectedly.

I think there's been some good stuff here regarding alternative methods to update Onyx data. But seems tangential to the race condition investigation so I'm gonna close this out.

It would be useful to re-approach this conversation by just looking at the Policy code and determining how we can or should modify Onyx to make it work better for us. Will create a new issue for that exploration and we can reference the convos here.