Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.

https://new.expensify.com

MIT License

3.58k stars 2.92k forks source link

[Discussion] Onyx.connect() design leads to ambiguously defined local values #6151

Closed marcaaron closed 3 years ago

marcaaron commented 3 years ago

cc @roryabraham @kidroca

Problem

Onyx.connect() updates local variables asynchronously wherever it is used. This has at times lead to incorrect assumptions about when a certain local variable will be "ready" or "not ready" and therefore increases the chances of race conditions (inconsistent results) and confusing time dependencies (do x, but only if y is "ready").

Why is this important?

While there haven't been a ton of scenarios where we have badly assumed that some Onyx value is "ready" I have personally encountered enough that I am assuming it is likely to happen again in the future. It is at the very least something we should watch out for.

Solution

There have been a several proposed solutions to this problem:

synchronous Onyx.get()
load all data upfront and then access values in cache
Onyx.isReady() promise that resolves when the value is ready
Create something like withOnyx() for use in actions files so that any actions will automatically get queued until the values are returned.
Create some kind of interface for an "action" and then create them with a factory or class like

function signIn({credentials}) {
    // ... do sign in stuff that requires credentials
}

const mapping = {
    credentials: {
        key: ONYXKEYS.CREDENTIALS,
    },
};

const Session = new ActionCollection({
    mapping,
    exports: [signIn],
});

// Where ActionCollection waits for values and internally
// decorates each exported method with something like
isReady().then(() => exportMethod(this.values));

kidroca commented 3 years ago

This solution seems a bit "greedy",has unnecessary complexity and will have problems in async scenarios

greedy - Looks like everything in mapping need to be ready before using any of the exports. Probably a lot of actions can be used right away but they'll have to wait just because they are part of an ActionCollection
too complex - when someone needs to write an action they'll need to learn how. When someone debugs an action they'll need to understand how this works. It's similar to withOnyx but withOnyx works on a single component and works in a passive way - it's not an action
applying it requires extra work - depending on where we want to incorporate this (e.g. actions/Report) would require a lot of refactoring to extract whatever we need to mappings and the provide it through actions
finally maybe the biggest problem - actions are async - storage mappings like function signIn({credentials}) represent the correct value at the start of the call. If the action involves a promise we can't trust the mappings inside a .then block

Here are some specific examples

When an action calls a function that also need Onyx data, it would have to make Onyx data available to that function (e.g. include more keys in mappings). Here's an example of a function called by actions https://github.com/Expensify/App/blob/910348ae8d633a29a6ee83268f6096c0b4039f79/src/libs/actions/Report.js#L540-L547 The action calling updateReportWithNewAction would not only have to provide the existing parameters but also dependencies like lastReadSequenceNumbers and currentAccountId. The result is increased coupling as now the caller provides internals

What about an action calling another action - I assume we'll be calling the other action through the designed interface, which should hopefully use the latest mappings

// This is part of `Reports` actions
function fetchReports(mappings) {
  // Triggers another action not part of Reports
  App.setSomethingIsLoading();

   API.fetchReports(mappings.currentUser)
       .then(reports => {
          reports.forEach(report => {
              // We can either call this action through the designed interface
              Reports.fetchActions(report.reportID);

             // Or since the function is already available in scope we can call it directly
             // I assume that param taking actions would expect `mappings` as last parameter
             fetchActions(report.reportID, mappings);

              // Also keep in mind our value of `mappings` is that from the start of the call
              // in this promise block the value in storage could already be different
          })
       })
}

kidroca commented 3 years ago

In the end it seems ActionCollection mappings might involve so many keys that we can just read everything from storage and provide it to every action/function and let it pick whatever it needs, but to overcome the Promise problem instead of passing a mappings Dictionary we could provide a getter function

function fetchReports(mappings) {
   API.fetchReports(mappings.get('currentUser'))
       .then(returnedReports => {
          App.setInitialDataLoaded();

          setTimeout(() => {
             // See this
             const reportMaxSequenceNumbers = mappings.get('reportMaxSequenceNumbers', []);
             const reportIDsWithMissingActions = _.chain(returnedReports)
                    .map(report => report.reportID)
                    .filter(reportID => isReportMissingActions(reportID, reportMaxSequenceNumbers[reportID]))
                    .value();
          }, CONST.FETCH_ACTIONS_DELAY.STARTUP)
       })
}

Well isn't the above very similar to this

function fetchReports() {
  Onyx.get('currentUser')
       .then(user => API.fetchReports(user))
       .then(returnedReports => {
            App.setInitialDataLoaded();

             setTimeout(() => {
               Onyx.get('reportMaxSequenceNumbers')
                   .then(maxSequenceNumbers => {
                         const reportIDsWithMissingActions = _.chain(returnedReports)
                             .map(report => report.reportID)
                             .filter(reportID => isReportMissingActions(reportID, maxSequenceNumbers[reportID]))
                             .value();
                   })
            }, CONST.FETCH_ACTIONS_DELAY.STARTUP)
         })
       })
}

By exposing Onyx.get and plain promises a lot of people would just understand this
It doesn't need a specially designed store and a isReady logic that would make all actions wait until mappings.get is ready to provide values
The action request only the values it needs and only when it needs them
When the value is available in cache it would be resolved immediately
Onyx.get can be applied only where needed

One thing that would improve the 2nd example is using async/await

async function fetchReports() {
  const user = await Onyx.get('currentUser');
  const reports = await API.fetchReports(user);
  App.setInitialDataLoaded();

  await delay(CONST.FETCH_ACTIONS_DELAY.STARTUP);

  const maxSequenceNumbers = await Onyx.get('reportMaxSequenceNumbers');
  const reportIDsWithMissingActions = _.chain(reports)
     .map(report => report.reportID)
     .filter(reportID => isReportMissingActions(reportID, maxSequenceNumbers[reportID]))
     .value();
}

The race condition problems are coming from trying to find a way to use storage in a sync way Actions don't return a value, they can be asynchronous. Why not expose Onyx.get and just use it? It's clearly a simpler way to deal with the problem - a general isReady check can only cover so much.

marcaaron commented 3 years ago

This solution seems a bit "greedy",has unnecessary complexity and will have problems in async scenarios

Now tell me what you like about it ? :trollface:

So yeah, points taken and I agree with a lot of it and mainly tried to come up with something to kick off this conversation that could maybe satisfy our constraints of

A - Not wanting excessive promise chains B - Not wanting to use Onyx.get() C - Not wanting to adopt Async/Await

Looks like everything in mapping need to be ready before using any of the exports

We could solve it by having a particular method blocked by the "readiness" of only the things it needs

when someone needs to write an action they'll need to learn how

That might not be a bad thing as we already need to learn withOnyx and the application seems really similar - which could make things more consistent. But I'll admit this does remind me of the initial concerns that others had about using something like Redux and fear of it having "too much boilerplate". So we should be cautious to avoid heading down a similar path here.

applying it requires extra work

I feel like this is OK as long as long as we like whatever solution we come up with. e.g. it would be a lot of work to replace all local variables with Onyx.get() and turn all actions into async functions - but the right design will be worth implementing.

actions are async - storage mappings like function signIn({credentials}) represent the correct value at the start of the call. If the action involves a promise we can't trust the mappings inside a .then block

I think you pretty much solved this with your next comment: https://github.com/Expensify/App/issues/6151#issuecomment-957424645. @roryabraham had a similar idea here.

Well isn't the above very similar to this

Yes, but violates constraints A and B

One thing that would improve the 2nd example is using async/await

Yes, that is constraint C

kidroca commented 3 years ago

A - Not wanting excessive promise chains B - Not wanting to use Onyx.get() C - Not wanting to adopt Async/Await

I think we should revisit some of these constraints. For example the idea behind not exposing Onyx.get is to prevent people using it in components, but you're not supposed to use anything but withOnyx anyway.

I've seem to have set a trap for myself with the Promise based example here but there are ways to make the promise chain flatter e.g. return delay(...) instead of setTimeout

Regarding Async/Await I guess it's something for another discussion. I'm just showing a good example of it's application IMO

Looks like everything in mapping need to be ready before using any of the exports

We could solve it by having a particular method blocked by the "readiness" of only the things it needs

Yeah but now we'll need to define mappings for each method and we still need to consider other actions or functions being called inside it - e.g. if someone decide to call a certain function they need to remember to update the mappings

applying it requires extra work

I feel like this is OK as long as long as we like whatever solution we come up with. e.g. it would be a lot of work to replace all local variables with Onyx.get() and turn all actions into async functions - but the right design will be worth implementing.

We can replace local variables with Onyx.get calls gradually one at a time, while if we introduce something like ActionCollection we'll need to rewrite the whole file

kidroca commented 3 years ago

IMO if we go with the ActionCollection we should make it as less configurable as possible - no mappings definition

Make Onyx.init return a promise
Make Onyx.init accept list of initial keys to pre-cache or cache the "cheap keys" (all safe for eviction)
When that promise resolves we're ready
Expose getSync that will be used as in the first example here

Hmm this would not work if we need to access something as report actions though ... So I'm back to Onyx.get and not having to deal with these cases

marcaaron commented 3 years ago

I think we should revisit some of these constraints.

👍

marcochavezf commented 3 years ago

TL;DR: I think Onyx.init returning a promise and Onyx.getSync could be enough to solve the design issues of Onyx.connect

Make Onyx.init return a promise

One of the advantages of returning a promise in Onyx.init (to indicate that the storage is ready during the app startup), is that we can initialize the Network loop after resolving the promise, which will prevent race conditions between reading storage for the first time and making network requests.

4. Expose getSync that will be used as in the first example here

We have 3 scenarios where Onyx is used to read data: components, libs (as API.js), and actions (as signIn). For the first one the HOC withOnyx fits perfectly, but for libs and actions is where the Pub/Sub pattern doesn't fit very well (main problem) and I think the Onyx.getSync function makes sense because:
1. It's easy (no learning curve for new contributors) and it's a straightforward function to get a value from storage.
2. Some other libraries have a similar strategy; for example, graphql has a HOC and hooks that reacts to events and update the component (similar to withOnyx) but it also has functions (which in this case return a promise) to be used in libraries or actions.
3. react-native-onyx can be more compatible with other libraries (as Redux) which will make it more attractive for other developers to integrate it into their projects.
4. AFAIK the Onyx.get (sync or async) is discouraged to not be used in the components, right? If that's the case could we add an eslint rule which allows to only use Onyx.getSync in the src/libs folder? (Something similar to this rule?).
5. Makes the Network module more testable. Right now is a little bit tricky to mock the AsyncStorage and Onyx.connect because Onyx is initialized at the beginning of API.js which happens before the unit test are set up. And with Onyx.init and Onyx.getSync we can remove the Onyx.connect callbacks from the top of API.js and we can have more control of Onyx behavior in the unit tests.
As a first iteration (without big changes) we can expose only the getSync function and maybe it will be enough, because if we return a promise in Onyx.init to make sure the storage is ready when the app starts up, then we shouldn't worry about reading values synchronously because we'll be reading always the latest updated value (Onyx.getSync reads from the cache and Onyx.merge updates the cache first before updating / inserting values to the disk).

marcaaron commented 3 years ago

Great thoughts here.

AFAIK the Onyx.get (sync or async) is discouraged to not be used in the components, right?

Close, but not quite - we have been avoiding its introduction entirely.

I think the problem with the get() or getSync() proposals I've seen is a lack of clear and specific guidance about when to use it. The original app philosophy for New Expensify is that we should be able to use Onyx.connect() for everything and that it would force us to think of creative solutions to any problems as we strive to make the app as "reactive" as possible and prevent inconsistency by only having one interface for interacting with store data.

could we add an eslint rule which allows to only use Onyx.getSync in the src/libs folder

We can create custom eslint rules, but this feels flexible given the context. So, I think first we need to figure out what are the cases where get() would be acceptable, and then see if there's anyway to identify this.

Anyways, I hope this helps think about the problem some more. And maybe we can think of a few more solutions to solve the "readiness" issue while not maintaining the original spirit of the project. It doesn't feel like this particular problem is severe enough for us to give up on that just yet.

kidroca commented 3 years ago

Onyx.getSync just won't work - it'll have the same problem we have with top level connections - has it loaded the data or not - I realised that and ended my comment It's because we can't read everything from storage into cache/memory - we can only read the finite items, which excludes conversations Even if we could read everything in memory, why keep it there when only 50% or less is going to be used

In the cases we're needing Onyx.get it's not due to reactivity, we use connect and withOnyx to react to a change We need Onyx.get when we want to ensure we're using the latest data through an action

Here's a practical application where I don't find anything else suitable but Onyx.get: https://github.com/Expensify/App/issues/5987#issuecomment-963471699

there is also an example of "a creative solution" using connect

Onyx.get is the most optimal way (so far) to deal with storage values that we don't need in memory all the time It works lazily on demand - if data is already cached great off we go, but it doesn't force something to linger in memory just in case

Actions already deal and wait for promises whether you wait on a http request or on Onyx.get doesn't make a difference to how the end result is handled

So far I've provided sound reasons to why we need Onyx.get exposed, but I don't see anything definitive as to why not and what problems could result by exposing it

marcaaron commented 3 years ago

So far I've provided sound reasons to why we need Onyx.get exposed, but I don't see anything definitive as to why not and what problems could result by exposing it

I'm not sure what more to add here. We have had bad experiences with combining cache getters and subscriptions and are trying to do something new here. If there's no solution then we can at least try to come up with clear rules about when to use a get pattern.

kidroca commented 3 years ago

Subscriptions use Onyx.get internally to start with the latest value - the value that other connections could have changed by now, the way Onyx works is virtually impossible to call Onyx.get and get an older value

The only mishap that I can see with people using it is similar to what I've shared earlier

function someAction() {
  Onyx.get(myKey)
      .then(myValue => {
          makeRequest()
              .then(() => /* can't rely on myValue here - you have to get it again */)
      })
}

So maybe we can enhance connections with a promise to get the best of both worlds

// top level connection call
const someConnection = Onyx.connect({
  key: 'myKey',
  callback: ...
});

function someAction() {
  someConnection.promise()
    .then(() => {
       if (someConnection.value == 'go') {
           return makeRequest();
       }
    })
    .then(() => {
       // someConnection.value is kept up to date and can be used here
    });
}

connection.promise resolves once the connection has finished reading the storage value connection.value is something similar to what we do here:

let myValue;
Onyx.connect({
  key: 'myKey',
  callback: val => myValue = value;
});

The downside of this is keeping the connection "forever" - Onyx.get allows us to release unused resources

marcaaron commented 3 years ago

Interesting! This might be over-engineering, but makes me think we could return a higher order function from connect() and maybe also have a connectMultiple() method so that we can wait for multiple values to be "ready". Which might help clean up logic like this.

let testOne;
let testTwo;
const withConnectionResolved = Onyx.connectMultiple([
    {key: 'testOne', callback: val => testOne = val},
    {key: 'testTwo', callback: val => testTwo = val},
], () => {//...values are ready});

function somePrivateAction() {
    //... do stuff with values
}

const somePublicAction = withConnectionResolved(somePrivateAction);

The downside of this is keeping the connection "forever" - Onyx.get allows us to release unused resources

That's true. But I think if you look at the code we have a zillion examples of Onyx.connect() calls that are never disconnected and it hasn't created any problems... yet 😄

marcaaron commented 3 years ago

A general disadvantage here is that we always have to "remember" that values are not "ready" by default. So the pattern feels more like a smell than a design - but maybe acceptable if we do not run into this problem very often.

marcochavezf commented 3 years ago

So maybe we can enhance connections with a promise to get the best of both worlds
// top level connection call
const someConnection = Onyx.connect({
  key: 'myKey',
  callback: ...
});

function someAction() {
  someConnection.promise()
    .then(() => {
       if (someConnection.value == 'go') {
           return makeRequest();
       }
    })
    .then(() => {
       // someConnection.value is kept up to date and can be used here
    });
}
connection.promise resolves once the connection has finished reading the storage value connection.value is something similar to what we do here:
let myValue;
Onyx.connect({
  key: 'myKey',
  callback: val => myValue = value;
});

I think this is what reactive extensions is about. For example, we could think that Onyx.connect returns a "stream" and we can subscribe to that stream to get values continuously over time (same as Onyx.connect works today) and we can compose streams with RxJS (which is like lodash for events) to combine multiple streams or to get the value up to date once:

const myKeyStream = Onyx.connect('myKey');

// we can subscribe to that stream to get multiple values over time
myKeyStream.subscribe(val => myValue = value);

// we can get the latest value once
myKeyStream.take(1).subscribe(val => myValue = value);

// we can convert the previous stream to a promise (https://rxjs.dev/api/index/class/Observable#topromise-)
// (which would be the same as `Onyx.get`)
myKeyStream.take(1).toPromise();

// and we can combine (https://rxjs.dev/api/index/function/forkJoin) different streams and be notified when 
// all of them are completed (getting only the latest value from both streams once)
forkJoin([
  Onyx.connect('testOne').take(1),
  Onyx.connect('testTwo').take(1),
]).subscribe(([valueOne, valueTwo]) => {
  // ...
});

kidroca commented 3 years ago

A general disadvantage here is that we always have to "remember" that values are not "ready" by default

My little reminder says "Hi"

function connect(params) {
  // Something will keep these up to date
  let currentValue;
  let isReady = false;

  /* ... */

  return {
    id: connectionId,
    promise: connectionPromise,
    disconnect: () => Onyx.disconnect(connectionId),
    get value() {
       if (!isReady && _.isUndefined(params.defaultValue) && __DEV__) {
         throw new Error(
           `Accessing connection value before it has been properly initialised. 
            Use the connection promise or provide a defaultValue`
         );
       }

       if (hasDisconnected && __DEV__) {
         throw new Error('Using a disconnected connection'); 
       }

       if (isReady) {
           return currentValue;
       }

       Log.info('Connection is not ready - using default value');
       return params.defaultValue;
    }
  }
}

Since we're not very fond of computed getters we can rename that to getValue and just be an ordinary function

Whoever defines a top level connection that tries to use the value too soon will be reminded to be more careful and take action

kidroca commented 3 years ago

RxJS could work, but IMO will reduce the people that can work on the project

If we return an object from connect we can expose some fields to make the value observable or asObservable method We can even make Onyx connections observable right now with something like

function observableConnection(key) {
  return new Observable((subscriber) => {
    Onyx.connect({
      key,
      callback: value => subscriber.next(value);
    });
}

Though we're lacking some callbacks like onError and onDisconnect to relay that information to the Observable

marcaaron commented 3 years ago

Thanks everyone! Chatting w/ @marcochavezf and we are going to close this for now and refer to it when we are ready to implement a solution. We've got a lot of ideas for how to address this issue more holistically, but want to wait for some more concrete issues to pop up before proposing anything formally.