cloudant / sync-android

A JSON-based document datastore for Android applications
Apache License 2.0
267 stars 91 forks source link

Request for information on "continuous replication" #585

Closed acutetech closed 6 years ago

acutetech commented 6 years ago

Could you provide more background on the "continuous replication" issue?

I have been using PouchDb with Cordova on Android phones. This features both "live replication" and a retry feature if the network is unavailable:

localDB.sync(remoteDB, { live: true, retry: true ... });

It has the effect that a pull-user is notified pretty much immediately a push-user makes a change, and is very effective.

This post https://stackoverflow.com/questions/38081376/cloudant-continuous-replication-android says syc-android "doesn't support continuous replications because they impact battery life." (not a problem with PouchDb?)

The StackOverflow post suggests restarting replication in the complete() event - I do this with a 5s delay and it works, but seems inelegant.

At doc/replication-policies.md there is a discussion of using JobScheduler, but the context is that replication might be done every hour or so, or maybe every few minutes - not continuously.

Then I find mention of a CouchDB _changes API, which is designed to allow a client to learn of changes as soon as they happen, and seems to be what is needed here: http://guide.couchdb.org/draft/notifications.html

In summary, what options are there for sync-android to provide near-real-time notifications of changes?

acutetech commented 6 years ago

I have looked at this a bit more. It seems that continuous replication needs as a starting point database accesses like these: GET "$HOST/db/_changes?feed=longpoll&since=2" GET "$HOST/db/_changes?feed=continuous&since=3"

It seems that the Pouchdb library uses longfeed for its live replication.

And while the sync-android library seems to support some _changes accesses in CouchClient.changes(), it does not implement calls with "longpoll" or "continuous" parameters. Furthermore, I found a comment in RelicationCompleted.java that says "Continuous replications (when implemented) will never complete" - which rather suggest it has been thought about but not implemented.

So the short answer seems to be that the library does not support continuous replication. Is that correct?

How much work would be involved with adding this? Would it be largely a matter of adding a version of the changes() method with extra parameters,or much more complex?

ricellis commented 6 years ago

Correct, the library does not (and will not) support continuous replication in the manner of an ongoing HTTP connection to the server changes feed. As has been noted maintaining an open connection to the DB server to listen for changes (even with longpoll) impacts device performance.

It is recommended to use WorkManager, JobScheduler or sync-android's own replication policies to set the interval for replication. By using these APIs the device is able to schedule work more efficiently and conserve power according to the parameters of the schedule. You can schedule jobs at any interval as required by your application. Keep in mind that you should ensure uniqueness if you are using a very short interval (such that a new replication won't start if the previous one hasn't finished) for example using enqueueUniquePeriodicWork or making the appropriate calls to see what jobs are running with the JobScheduler APIs.

acutetech commented 6 years ago

Thanks Rich. I had started poking around before your response came in. FYI, I have add a new method to CouchClient that uses longpoll:

public ChangesResult changes(Object since, Integer limit, String feedType) {
    Map<String, Object> options = getParametrizedChangeFeedOptions(since, limit);
    if (feedType != null && feedType.equals("longpoll")) {
        options.put("feed", feedType);
    }
    return this.changesRequestWithGet(options);
}

and call it with: ChangesResult changes = mCouchClient.changes("now", null, "longpoll");

It appears to behave as I would have expected. Response is real-time. If changes.size() > 0 I then run the replicator. If 0 then it's a timeout (default 60s, do it again).

I think "since=now" or setting since to changes.getLastSeq() or getDbInfo.getUpdateSeq() will return only fresh changes.

I might try your suggestions, but I wonder (if you have time) if I could challenge you (in a friendly way) to justify the "impacts device performance" assertion. My gut feeling is that if fast responsiveness is desired, then making a continuous sequence of feed=normal GETs would be much more resource-intensive?

Evidently the PouchDb people are happy with longpoll.

acutetech commented 6 years ago

Another way of looking at this: I'm looking for a light-weight responsive "onRemoteDbChanged()" event...

ricellis commented 6 years ago

Evidently the PouchDb people are happy with longpoll

I can't really speak to the happiness of PouchDb users/devs, but I have seen at least one writeup for using PouchDb offline first where the ongoing HTTP longpoll was considered inefficient.

if I could challenge you (in a friendly way) to justify the "impacts device performance" assertion

I agree with you that running continuous GET requests of the normal changes feed is unlikely to be more efficient than keeping an ongoing longpoll, but there are two complications to that simple picture:

  1. Firstly, the application shouldn't run continuously. You earlier stated that you used a 5 second delay between replications and that this worked well enough, albeit inelegantly. So by that reckoning your application performs acceptably without running continuously and, for example, (as a gross simplification) if the GET of the _changes takes 1 second every 5 seconds then you are saving 80% of the time that the network needs to be awake over the ongoing longpoll with a 5 second timeout. Further by using the scheduling APIs the application could respond to device conditions to adapt the frequency of replication. For example if 5 seconds is the minimum acceptable, but it is better with 2 seconds then you could use 2 seconds if the device is plugged in.
  2. Secondly, keeping an ongoing connection doesn't scale to multiple applications on the device. If every application keeps an open connection the network connection on the phone needs to remain active consistently. By using the available scheduler APIs the requests of multiple applications can be grouped together and serviced in a single wake from idle.

I'm looking for a light-weight responsive "onRemoteDbChanged()" event...

Following on from the designs discussed in the service worker post if you really want to be light-weight on the client side, but driven in response to remote changes then you may want to consider a continuous changes feed listener process operating on the server-side of your application. In response to changes it could send a message to the device e.g. using FCM. Multiple applications share a single messaging connection and the push messaging architecture is optimized to be very battery efficient on device. YMMV as these considerations are very application dependent, if you are expecting changes to arrive nearly continuously then there is probably little to be gained over scheduling in having a server-side process and messaging architecture. However, if the changes are intermittent then that push-style architecture is always going to be more efficient than anything running on the device that has to repeatedly check the server.