Fix to cope with race condition with multiple refs for one sync record

feedhenry / fh-sync

Node.js implementation of the FeedHenry Data Synchronisation Server. To be used in conjunction with the FeedHenry Data Synchronisation Client.

http://feedhenry.org

Apache License 2.0

5 stars 17 forks source link

Fix to cope with race condition with multiple refs for one sync record #48

Closed deewhyweb closed 4 years ago

deewhyweb commented 5 years ago

This PR adds a fix to a scenario where there are multiple refs against a single dataset local record. In certain circumstances the record can have data null but with up to date hash. In this case the record is never updated until the hash is removed.

wtrocki commented 5 years ago

What you mean by multiple refs? I will try to help to review this

deewhyweb commented 5 years ago

When a sync record in a collection fhsync_<>_records has multiple entries in the refs field. This means there are multiple fhsync_dataClient records relating to this single dataset record. This will happen if two list handler calls to the same dataset with different query_params return the same record.

In this case, if the record is then no longer returned from one of the list calls, the record data field is set to null, and one of the refs is removed. But because the hash is not removed, when the second list handler fires, the record is not updated because the hash in fhsync_<>_records is the same as the hash of the record retrieved via the list handler.

wtrocki commented 5 years ago

ping @davidffrench

wei-lee commented 5 years ago

@deewhyweb the code change looks trivial enough, is it possible to update the integration tests to add a test case for the scenario you have described?

deewhyweb commented 5 years ago

I'll take a look at the integration tests and add one for this scenario. I wonder though if there is not a better solution to this issue. When a list handler for a particular datasetClient returns a result set without a particular record (meaning the record is either deleted on the backend or no longer part of the query) would it not make more sense to leave the data field untouched and simply remove the reference to the datasetClient from the refs array? One of the issues we're seeing is when there are multiple refs for a record if the record is removed from one list result but not from the second, the first list result will null the data which can lead to null data being returned to the client when syncing the second datasetClient.

wtrocki commented 4 years ago

@deewhyweb I will close that PR and open new one in order to make changes on your branch.

wtrocki commented 4 years ago

@deewhyweb Amazing contribution. I have played with the change and build very raw example to replicate this issue.

When a list handler for a particular datasetClient returns a result set without a particular record (meaning the record is either deleted on the backend or no longer part of the query) would it not make more sense to leave the data field untouched and simply remove the reference to the datasetClient from the refs array

Yes this will be actually much better fix, however it is more prone to some corner cases that I cannot predict now.

the way it could work is

 } else if (op === 'delete') {
      //remove the ref
      update['$pull'] = {'refs': datasetClientId};
      // If there are still other refs
      if(datasetClientId && datasetClientId.length !==0){
             // Remove set data to null as there are other refs already linked
             delete update['$set'].data;
      }
    }

This way clients will not be affected so we going to get more efficient handling for this case.