couchbase / sync_gateway

Manages access and synchronization between Couchbase Lite and Couchbase Server
https://www.couchbase.com/products/sync-gateway
Other
448 stars 138 forks source link

Replication of a new document causes multiple conflicts #3518

Closed episage closed 6 years ago

episage commented 6 years ago

Sync Gateway version

{"committed_update_seq":6,"compact_running":false,"db_name":"testing_1525080771752","disk_format_version":0,"instance_start_time":1525080782148506,"purge_seq":0,"state":"Online","update_seq":6}

Operating system

version: '2'

services:

  sgw1:
    image: couchbase/sync-gateway:2.0.0-community
    restart: always
    command: -adminInterface :4985 /etc/sync_gateway/config.json
    ports:
    - "127.0.0.1:5001:4984"
    - "127.0.0.1:5002:4985"

Config file

default, nothing changed, look at command above

Log output

sgw1_1  | 2018-04-30T09:33:02.146Z HTTP:  #001: PUT /testing_1525080771752/ (as ADMIN)
sgw1_1  | 2018-04-30T09:33:02.146Z Opening db /testing_1525080771752 as bucket "testing_1525080771752", pool "default", server <walrus:>
sgw1_1  | 2018-04-30T09:33:02.146Z Opening Walrus database testing_1525080771752 on <walrus:>
sgw1_1  | 2018-04-30T09:33:02.147Z Design docs for current view version (2.0) do not exist - creating...
sgw1_1  | 2018-04-30T09:33:02.147Z Design docs successfully created for view version 2.0.
sgw1_1  | 2018-04-30T09:33:02.147Z Verifying view availability for bucket testing_1525080771752...
sgw1_1  | 2018-04-30T09:33:02.148Z Views ready for bucket testing_1525080771752.
sgw1_1  | 2018-04-30T09:33:02.148Z Initializing changes cache for database testing_1525080771752 with sequence: 0
sgw1_1  | 2018-04-30T09:33:02.148Z Using default sync function 'channel(doc.channels)' for database "testing_1525080771752"
sgw1_1  | 2018-04-30T09:33:02.148Z HTTP+: #001:     --> 201 created  (2.7 ms)
sgw1_1  | 2018-04-30T09:33:02.149Z HTTP:  #002: PUT /testing_1525080771763/ (as ADMIN)
sgw1_1  | 2018-04-30T09:33:02.150Z Opening db /testing_1525080771763 as bucket "testing_1525080771763", pool "default", server <walrus:>
sgw1_1  | 2018-04-30T09:33:02.153Z Opening Walrus database testing_1525080771763 on <walrus:>
sgw1_1  | 2018-04-30T09:33:02.153Z Design docs for current view version (2.0) do not exist - creating...
sgw1_1  | 2018-04-30T09:33:02.153Z Design docs successfully created for view version 2.0.
sgw1_1  | 2018-04-30T09:33:02.154Z Verifying view availability for bucket testing_1525080771763...
sgw1_1  | 2018-04-30T09:33:02.156Z Views ready for bucket testing_1525080771763.
sgw1_1  | 2018-04-30T09:33:02.156Z Initializing changes cache for database testing_1525080771763 with sequence: 0
sgw1_1  | 2018-04-30T09:33:02.157Z Using default sync function 'channel(doc.channels)' for database "testing_1525080771763"
sgw1_1  | 2018-04-30T09:33:02.158Z HTTP+: #002:     --> 201 created  (8.8 ms)
sgw1_1  | 2018-04-30T09:33:02.177Z HTTP:  #003: POST /testing_1525080771752/_bulk_docs (as ADMIN)
sgw1_1  | 2018-04-30T09:33:02.179Z HTTP+: #003:     --> 201   (1.9 ms)

Expected behavior

SGW should not return conflicts for "rev": "2-A" and "rev": "1-A" when querying: http://localhost:5002/testing_1525080771752/_changes?active_only=true&style=all_docs becasue each replicated document via post_bulk_docs has a unique revision and number

Actual behavior

SGW returns conflicts on http://localhost:5002/testing_1525080771752/_changes?active_only=true&style=all_docs. Every single document is conflicted (probably because each document's parent is equal to -1).

{
  "results": [
    {
      "seq": 4,
      "id": "[docX.mouse]-ID_X",
      "changes": [
        {
          "rev": "3-B"
        },
        {
          "rev": "1-A"
        },
        {
          "rev": "2-A"
        },
        {
          "rev": "3-A"
        }
      ]
    },
    {
      "seq": 6,
      "id": "[docX.test]",
      "changes": [
        {
          "rev": "1-KA"
        },
        {
          "rev": "1-K"
        }
      ]
    }
  ],
  "last_seq": "6"
}

Steps to reproduce

            const docs = [];
            var sameId = `ID_X`;
            var counter = 0;
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `1-A`,
              sound: 'squeek',
              random: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `2-A`,
              sound: 'moo',
              random: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-A`,
              sound: 'moom',
              random: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-B`,
              sound: 'pi',
              random: `(${counter++})`,
            });

            docs.push({
              _id: `[docX.test]`,
              _rev: `1-K`,
              keks: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.test]`,
              _rev: `1-KA`,
              keks: `(${counter++})`,
            });

            // create conflicts
            sourceAdapter.docsBulkWrite(docs, false, (error, writeResults) => {
              callback(error, writeResults);
            });
tleyden commented 6 years ago

@episage it looks like you're creating conflicts with the SG bulk_docs API, and then you're surprised to get conflicts back from the SG changes API. If you don't want conficts back on the SG changes API, then why are you creating them?

That's my interpretation on what I'm seeing based on:

           docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-A`,
              sound: 'moom',
              random: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-B`,
              sound: 'pi',
              random: `(${counter++})`,
            });
           // create conflicts
            sourceAdapter.docsBulkWrite(docs, false, (error, writeResults) => {
              callback(error, writeResults);
            });

Here it looks like you are creating two conflicting revisions (3-A, 3-B) for the [docX.mouse]-${sameId} doc. I'm assuming that sourceAdapter.docsBulkWrite() eventually calls down to the _bulk_docs SG API endpoint with new_edits=false to indicate that SG should allow conflicts? (side note: the docs might need some improvement on that parameter, and so I filed a docs ticket)

If you can point to the code that is called from sourceAdapter.docsBulkWrite(), that might be helpful. Basically I'd like to confirm the value of the new_edits parameter you're passing to the _bulk_docs SG API endpoint.

Also, taking a step back, if you could give a high level description of what you're trying to accomplish (and avoid), it might help in terms of recommending a best approach.

Generally speaking, you want to avoid creating conflicting revisions in Sync Gateway, and so you'd never really want to purposefully insert conflicting revisions (if that's what you're actually doing).

episage commented 6 years ago

@tleyden I want to support offline-first functionality in my app.

The use case: Client A was offline for 1 hour.

Here is a better example:

            const docs = [];
            var sameId = generateUuid();

            // this is the initial document
            // client is OFFLINE
            docs.push({ 
              _id: `[docX.mouse]-${sameId}`,
              _rev: `1-A1`,
              sound: 'squeek',
            });

            // later on client realized that the mouse should do "meow" sound
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `2-A2`,
              sound: 'meow',
            });

            // later on client thought that mouses do "roar" sounds
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-A3`,
              sound: 'roar',
            });

            // client went ONLINE, thus the replication executed docsBulkWrite
            // I expect no conflicts with the mouse document

            sourceAdapter.docsBulkWrite(docs, false, (error, writeResults) => {
              callback(error, writeResults);
            });

Here is the code of the adapter docsBulkWrite:

  function docsBulkWrite(docsArray, newEdits, callback) {
    const bulkDocsBody = {
      docs: docsArray,
      new_edits: !!newEdits,
      all_or_nothing: false,
    };

    client.database.post_bulk_docs(
      databaseName,
      bulkDocsBody,
      (networkError, isSuccess, data, http) => {
        if (networkError) {
          return callback({ networkError, http });
        }
        if (!isSuccess) {
          return callback({ data, http });
        }

        const responsesByIdAndRev = data.reduce((acc, workingResponse) => {
          const id = workingResponse.id;
          if (typeof acc[id] === 'undefined') {
            acc[id] = {};
          }
          if (workingResponse.rev) {
            acc[id][workingResponse.rev] = workingResponse;
          } else {
            if (typeof acc[id].unknown === 'undefined') {
              acc[id].unknown = [];
            }
            acc[id].unknown.push(workingResponse);
          }
          return acc;
        }, {});

        let writeResults;
        if (bulkDocsBody.new_edits === false) {
          // response doesnt contain 'ok' nut contains id & rev

          writeResults = docsArray.map((doc) => {
            var id = doc._id;
            var rev = doc._rev;

            const responsesByRev = responsesByIdAndRev[id];
            const workingResponse = responsesByRev[rev]
              ? responsesByRev[rev]
              : responsesByRev.unknown.length > 0
                ? responsesByRev.unknown.shift()
                : new Error(`Missing response for ${id}/${rev}`);

            if (!workingResponse) {
              // assume error when no response
              return new WriteResult(
                WriteStatus.ERROR,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            } else if (workingResponse.ok) {
              return new WriteResult(
                WriteStatus.OK,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            } else if (
              workingResponse.ok === undefined &&
              workingResponse.id &&
              workingResponse.rev
            ) {
              return new WriteResult(
                WriteStatus.OK,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            }
            return new WriteResult(
              WriteStatus.ERROR,
              doc,
              doc._id,
              doc._rev,
              workingResponse,
            );
          });
        } else {
          writeResults = docsArray.map((doc) => {
            const idRev = doc._id + doc._rev;

            const workingResponse = responsesByIdAndRev[idRev];
            if (!workingResponse) {
              // assume error when no response
              return new WriteResult(
                WriteStatus.ERROR,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            } else if (workingResponse.ok) {
              return new WriteResult(
                WriteStatus.OK,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            }
            return new WriteResult(
              WriteStatus.ERROR,
              doc,
              doc._id,
              doc._rev,
              workingResponse,
            );
          });
        }

        return callback(null, writeResults);
      },
    );
  }

After the code executes, I expect the SGW to contain NO conflicts since the REVs are in-tact. Inside the docsBulkWrite callback: image error is null writeResults are all ok

HOWEVER, the database testing_1525110520386 - which is the source database, shows conflicts as you can see in the picture below: image

Extra: Here is the raw document in question (parent = -1): image

I would expect the document to have NO conflicts.

Why does it happen? Is it an SGW error?

episage commented 6 years ago

NB, the documentation for SGW regarding replication is not-for-humans. Every time replication topic comes back, it makes me sick. Especially that meaningless nesting of variables/arrays/objects such as "ok" in JSON data structures.

tleyden commented 6 years ago

I want to support offline-first functionality in my app.

What kind of app is this? I take it this isn't using a Couchbase Lite SDK? Is this a Phonegap-style app with "custom sync" to Sync Gateway?

Client A was offline for 1 hour.

Is Client A creating conflicting revisions with it's own changes, or changes from another client?

tleyden commented 6 years ago

From a high level design perspective, I would say that conflicts should only arise during concurrent offline updates to a doc by different devices:

at this point they will need to push conflicting revisions that are in conflict, since they diverged off of "doc1, rev1-a".

OTOH, if the updates are on a single device, then you would always base changes on the previous revision, and you'd get a linear (non-conflicting) history of updates.

Just want to focus on the high level usage before trying to drill in on the lower level potential API bugs.

tleyden commented 6 years ago

I think I see what's happening:

Extra: Here is the raw document in question (parent = -1):

So it's creating three separate root revisions. In other words, your revision 2-A2 is being specified as having no parent, so it ends up as a root revision.

In the documentation for the bulk_docs endpoint, it says:

_rev | stringRevision identifier of the parent revision the new one should replace. (Not used when creating a new document.)

What might be happening this that you are sending _rev: 2-A2" but Sync Gateway doesn't have an existing _rev: 2-A2, it's probably creating a new root revision for it. Regardless though, I'm pretty sure that you aren't' using the SG replication API the same way that Couchbase Lite is doing.

Overall, it looks like you are recreating a lot of the functionality that Couchbase Lite provides rather than using Couchbase Lite, since maybe it doesn't suit your use case. If that's the case, you'll probably have to dig deeper into the Couchbase Lite code to see how it works (before it moved to the websocket based replication protocol, if you want to push over HTTP/REST).

Another approach that might be easier to read the Couchbase Lite code is just to run the Todo sample app and sniff the replication using tools like:

That way you'll be able to see examples of the replication API usage.

Closing the issue because I don't see enough evidence of this being a Sync Gateway bug.

tleyden commented 6 years ago

@episage one other thing that might interest you is the /{db}/_revtree/{doc} endpoint:

https://developer.couchbase.com/documentation/mobile/2.0/references/sync-gateway/admin-rest-api/index.html?v=2.0#/document/get__db___revtree__doc_

It returns a graphviz diagram of the revtree that can be visualized using graphviz (or webgraphviz.com if you don't want to install anything)

Some examples of what the rendered revtrees look like (some with multiple-roots): https://github.com/couchbase/sync_gateway/issues/2847#issuecomment-326402432

episage commented 6 years ago

@tleyden I'm writing SGW JS client (for browsers). Thanks for the links

tleyden commented 6 years ago

Have you checked out PouchDB?

You could probably leverage a lot of their existing work .. or at least use it as comparison for debugging (especially coupled w/ the network sniffing tools I posted).

There has been few SGW tickets from people who are using PouchDB and ran into some minor compatibility issues.

episage commented 6 years ago

@tleyden I did many times and had many attempts at it but it's not compatible with SGW. It used to be a couple of years ago.

episage commented 6 years ago

and the minor compatibility issues are really major

tleyden commented 6 years ago

Ok good to know

jamesnocentini commented 6 years ago

the documentation for SGW regarding replication is not-for-humans...Especially that meaningless nesting of variables/arrays/objects such as "ok" in JSON data structures.

@episage could you share a link to an example of that? It probably needs to be updated.

episage commented 6 years ago

@jamiltz AFAIK Couchbase/Sync Gateway don't have any documentation regarding writing a replicator. Thus, it's very difficult to write one. I made a lot of effort to combine CouchDB documentation and own findings to write a working replicator in JS (browser-compatible JS). By working, I mean one that doesn't contain bugs. It's very easy to make one in the replication protocol.

In terms of conflict resolution the only article that has some meaningful code and notes is this one: https://developer.couchbase.com/documentation/mobile/current/guides/sync-gateway/resolving-conflicts/index.html In terms of quality of code in the article, ehm... it's disputable. I had hard time understanding how to resolve a conflict and how to create it. Why is the article hard to comprehend?

jamesnocentini commented 6 years ago

@episage Thanks for the detailed feedback! Those are all valid points and noted https://issues.couchbase.com/browse/DOC-3575