couchbase / couchbase-lite-ios

Lightweight, embedded, syncable NoSQL database engine for iOS and MacOS apps.
Apache License 2.0
1.62k stars 297 forks source link

CBL 2.0 Swift tries to fetch docs with older revision and gets stuck #2052

Closed jayzhanghs closed 6 years ago

jayzhanghs commented 6 years ago

For one of our environments with Sync Server 5.5 and CBL 2.0 the sync of couchbase constantly gets stuck after a certain number of documents get pulled. The server side log indicates the CBL is trying to get a few documents of out-dated revision and thus gets stuck.

2018-01-31T19:29:12.579Z WARNING: blipHandler can't get doc "a80713c7-478b-4ce3-9aec-b2f17ba0095e"/2-5a6533b41b9f1426b9ca9bf1475f794d: 404 missing -- rest.(*blipHandler).sendRevision() at blip_sync.go:374
2018-01-31T19:29:12.611Z WARNING: blipHandler can't get doc "691b6456-3342-4a5d-afff-2ed1cc7b1bab"/2-3a7269ade423c08b26e9436f863dda8f: 404 missing -- rest.(*blipHandler).sendRevision() at blip_sync.go:374
2018-01-31T19:29:12.611Z WARNING: blipHandler can't get doc "83fd613a-e17c-49e1-917c-67a32fba9dc6"/2-91f4fc4044c08c46a2796352490736be: 404 missing -- rest.(*blipHandler).sendRevision() at blip_sync.go:374

Since I don't specify any revision at all, I'm wondering if this is some bug on the CBL 2.0 which is similar to a bug a few years ago: https://github.com/couchbase/couchbase-lite-ios/issues/377


snej commented 6 years ago

Can you describe the situation in more detail than "constantly gets stuck"?

Also, please attach CBL sync logs (as a gist.) Thanks!

gajohnson commented 6 years ago

@snej On fresh installations of our ios app running couchbase lite 2.0 DB 21, the initial blip sync for this user never reaches the complete state. After we timeout and restart sync in our own application, sync never leaves busy state. Sync logs show 2 or 3 documents are missing (404 errors show below). These warnings are logged on every sync attempt for this user, not just the initial attempt.

It seems like there is some issue either in the client or the sync gateway where it is hanging on to an old revision rather than picking up the change. Also worth noting that all 3 of these documents are flagged as deleted, and all have the same "tombstoned_at" value of 1515238392.

Here is a snippet of the sync logs showing the relevant requests from blip sync along with 1 of the documents that 404's on every sync. We have completely removed and reinstalled the ios app but continue to see this problem where the device is requesting and old revision repeatedly.


2018-02-01T16:23:16.488Z HTTP:  #1639655: GET /db/_blipsync  (as XXXXXXXXX@example.com)
2018-02-01T16:23:18.032Z changes_view: Query took 1.383185457s to return 25632 rows, options = db.Body{"stale":false, "startkey":[]interface {}{"AccountName::Customer::*", 0x1}, "endkey":[]interface {}{"AccountName::Customer::*", 0x10debc5}}
2018-02-01T16:23:18.342Z changes_view: Query took 308.987077ms to return 9643 rows, options = db.Body{"stale":false, "startkey":[]interface {}{"AccountName::CustomerGroup::55aa4536-9693-482d-93dd-0c802e080ede::Catalog", 0x1}, "endkey":[]interface {}{"AccountName::CustomerGroup::55aa4536-9693-482d-93dd-0c802e080ede::Catalog", 0x10284bb}}
2018-02-01T16:23:20.314Z changes_view: Query took 1.972389088s to return 76408 rows, options = db.Body{"startkey":[]interface {}{"AccountName::UserGroup::*", 0x1}, "endkey":[]interface {}{"AccountName::UserGroup::*", 0x10df970}, "stale":false}
2018-02-01T16:23:20.575Z changes_view: Query took 253.250962ms to return 10341 rows, options = db.Body{"stale":false, "startkey":[]interface {}{"AccountName", 0x1}, "endkey":[]interface {}{"AccountName", 0x10280fe}}
2018-02-01T16:23:20.836Z changes_view: Query took 261.039358ms to return 8758 rows, options = db.Body{"stale":false, "startkey":[]interface {}{"AccountName::CustomerGroup::06a5a692-5e0e-49e0-8ef2-afd0e8bcb1b1::Catalog", 0x1}, "endkey":[]interface {}{"AccountName::CustomerGroup::06a5a692-5e0e-49e0-8ef2-afd0e8bcb1b1::Catalog", 0x193d09}}
2018-02-01T16:23:21.099Z changes_view: Query took 263.226548ms to return 9034 rows, options = db.Body{"stale":false, "startkey":[]interface {}{"AccountName::CustomerGroup::1cfcdae2-a6c2-4117-8af1-027868c2fa91::Catalog", 0x1}, "endkey":[]interface {}{"AccountName::CustomerGroup::1cfcdae2-a6c2-4117-8af1-027868c2fa91::Catalog", 0x10283d5}}
2018-02-01T16:23:22.013Z changes_view: Query took 913.602292ms to return 35047 rows, options = db.Body{"startkey":[]interface {}{"AccountName::Manufacturer::*", 0x1}, "endkey":[]interface {}{"AccountName::Manufacturer::*", 0x10defd0}, "stale":false}
2018-02-01T16:24:16.917Z WARNING: blipHandler can't get doc "a80713c7-478b-4ce3-9aec-b2f17ba0095e"/2-5a6533b41b9f1426b9ca9bf1475f794d: 404 missing -- rest.(*blipHandler).sendRevision() at blip_sync.go:374
2018-02-01T16:24:17.565Z WARNING: blipHandler can't get doc "691b6456-3342-4a5d-afff-2ed1cc7b1bab"/2-3a7269ade423c08b26e9436f863dda8f: 404 missing -- rest.(*blipHandler).sendRevision() at blip_sync.go:374
2018-02-01T16:24:17.565Z WARNING: blipHandler can't get doc "83fd613a-e17c-49e1-917c-67a32fba9dc6"/2-91f4fc4044c08c46a2796352490736be: 404 missing -- rest.(*blipHandler).sendRevision() at blip_sync.go:374

691b6456-3342-4a5d-afff-2ed1cc7b1bab
{
  "$hs": {
    "account": "AccountName",
    "type": "AccountName::Address"
  },
  "$schema": "",
  "_deleted": true,
  "_sync": {
    "rev": "3-77d627eafffec1c12045eab8602dd8cb",
    "flags": 1,
    "sequence": 5491383,
    "recent_sequences": [
      1074645,
      5131067,
      5491383
    ],
    "history": {
      "revs": [
        "2-3a7269ade423c08b26e9436f863dda8f",
        "3-77d627eafffec1c12045eab8602dd8cb",
        "1-728445d860af974f5ee902dd4364964a"
      ],
      "parents": [
        2,
        0,
        -1
      ],
      "deleted": [
        1
      ],
      "channels": [
        null,
        null,
        [
          "AccountName::Customer::*",
          "AccountName::Customer::87318222-9089-4145-b9c7-d7480048db5d"
        ]
      ]
    },
    "channels": {
      "AccountName::Customer::*": {
        "seq": 5131067,
        "rev": "2-3a7269ade423c08b26e9436f863dda8f"
      },
      "AccountName::Customer::87318222-9089-4145-b9c7-d7480048db5d": {
        "seq": 5131067,
        "rev": "2-3a7269ade423c08b26e9436f863dda8f"
      }
    },
    "cas": "",
    "tombstoned_at": 1515238392,
    "time_saved": "2018-01-06T11:33:12.416220617Z"
  }
}```
borrrden commented 6 years ago

To be clear, CBL logs are logs from the device, and are not the same as sync gateway logs. I believe the logs get output to NSLog for 2.0. That's the information that is needed.

pasin commented 6 years ago

@gajohnson do you mean the replication gets stuck after the warning? Any chances to provide the client log with Replicator log domain enabled? It would be super helpful if you could also provide a testing SGW that we could use to reproduce the issue.

@adamcfraser what could cause the can't get doc warning? From the @gajohnson's comment above, the 691b6456-3342-4a5d-afff-2ed1cc7b1bab seems to exists but is deleted.

2018-02-01T16:24:17.565Z WARNING: blipHandler can't get doc "691b6456-3342-4a5d-afff-2ed1cc7b1bab"/2-3a7269ade423c08b26e9436f863dda8f: 404 missing -- rest.(*blipHandler).sendRevision() at blip_sync.go:374
adamcfraser commented 6 years ago

The "can't get doc" is being logged when the attempt to retrieve the revision returns an error - 404 missing in this case.

From the document above, it looks like rev 2 (the missing revision) is being returned by the subChanges call (even though rev 3 is the active tombstone) because it's the revision at which the document was removed from the user's channels.

This sounds like the equivalent of https://github.com/couchbase/sync_gateway/issues/2815, but I'm not clear why that would resurface when using the 2.0 replicator.

pasin commented 6 years ago

From chatting with @adamcfraser, this is possible a SG issue. I moved the issue to https://github.com/couchbase/sync_gateway/issues/3281.

@jayzhanghs and @gajohnson please discuss the issue there.