clintonhealthaccess / chailmis-android

Android Tablet Application for Logistics Management in the DHIS2 Platform
Other
7 stars 6 forks source link

Investigate failure-on-synch modalities #259

Open karloskalcium opened 8 years ago

karloskalcium commented 8 years ago

There seems to be confusion as to what happens when there is a synch failure. Per Bin Ma, who investigated the current behavior of synching on the application, the following takes place:

Start Bin Ma comments: About our app's syncing strategy: automatic sync is triggered hourly, manual sync will trigger after user clicks the sync button.

About our app's syncing work flow: 1)Start the sync: retrieve all the data set that marked as unsynched from the database and push to the server. 2)Sync succeeds: mark all unsynched data set as synced. 3)Sync fails: all unsynched data set stays unsynched.

Conclusions:

There are several conclusions based on the sync process. If we failed to sync data to the server, the unsynched data set will be send to the server again after one hour. So if there is no error about the server’s calculation, then the SOH statistics during a relatively long interval (monthly statistics e.g) will not appear any significant deviation, while in a relatively short interval(less than a day), it may appear some deviation.

What we can do from the client side: We can add some extra retry strategy to every single sync(retry for 3 times, interval time are 10s, 30, 90s) to reduce the deviation in short interval, but as is analysed above, failure on a single update should not affect the statistics in long interval that much. So we may investigate more about previous SOH statistics problem. End Bin Ma comments.

KB: We should investigate whether there are other failure modes, such as 1) Client synchs to server, receives confirmation of synch, but server doesn't store values somehow 2) Client synchs to server, receives failure (or no response), but client does not retry

All of this comes down to ensuring that a distributed transaction (e.g. between server and client) is robust.

To investigate this further, we could perhaps set up a mock instance of DHIS2 that responds with different failure modes (e.g. no response, failure response, success response, success response (code 200) with failure or other error in the JSON response, etc, if such doesn't exist already, so that we can confirm behavior of client application. Additional testing would have to be done on the server side, as well as investigations of the logs, to see if there are cases where a success message is sent to the client but the server still fails to save the data.

ihassin commented 8 years ago

@all please hold off on this for now until we settle #253. Whatever we decide there will impact this card.

garymabin commented 8 years ago

253 has been closed. And I don't think we may spend time on the issue, cause we have changed the whole process of SOH Syncing should concentrate on current implementation.