Closed ArihantRk closed 7 years ago
Additional context here: https://forums.couchbase.com/t/older-document-revision-getting-purged-from-couchbase/11347
@ArihantRk Leaving aside your particular use case, it sounds like the issue that you're hitting is that cluster replication is stopping when you get a 'not found' error in a bulk_docs request. Is that accurate?
It sounds like this is a particular issue related to using channel-based replication in sg-replicate. Channel removal notifications may be sent for inactive revisions (since the removal tracks the rev id at which the doc was removed from the channel), and there's no expectation that the revision body will be available for those revisions.
Couchbase Lite clients don't attempt to retrieve the body for channel removals, so wouldn't hit this issue. I think there are two issues:
@tleyden I'd be interested in your take on these two issues.
Thanks for your time @adamcfraser . yes, replication is stopping when 'not found' error in bulk_docs request. i would more interested channels removal is pruning the revision in couchbase? and i could not see same issue for all documents. i have set revs_limit=5, revision going beyond 5 . older revision will get pruned. but could some documents 2nd revision is getting pruned. could you please be more detail how pruning happens in couchbase. is revs_limit controlling pruning at couchbase?
You're talking about two different concepts.
Revs_limit determines the size of the revision tree - how much of the revision history is retained for a document. This isn't the full revision body - just the previous revision ids and some additional document metadata.
Bodies of non-leaf revisions aren't retained. A temporary copy is made for use by in-flight replications, but this temporary copy has an expiry of 5 minutes.
Normally non-leaf revision IDs aren't included in the changes response. The exception is channel removal notifications - the changes feed can include an old, non-leaf revision ID to inform consumers that the document was removed from the channel at the specified revision. In this scenario, it's expected that the consumer will not be able to retrieve that revision - not only because the revision body may have expired, but because the document is no longer in the channel that the consumer has access to.
That's the potential issue with channel filtered cluster replication I described above - replication shouldn't treat 'not found' for changes entries marked 'removed' as an error.
@adamcfraser do you mean non-leaf revision bodies will not be retained in syncgateway?
when i do CURL for older revisions from syncgateway i can able to get the full older document . does syncgateway fetching from couchbase? how many revisions will be exists in couchbase.
revs_limit is only to syncgateway or does is have any effect on couchbase also.
I think the answers to most of those questions are in my previous post. It's possible that the body of the older revision is still resident in a particular Sync Gateway node's revision cache (such as the node where the revision was originally written), but there are no guarantees around the availability of the bodies of non-leaf revisions.
@adamcfraser , can this fix expected in next release?
I don't expect this will make it in to 1.4 - it will be in the subsequent release.
@ArihantRk awesome bug report btw! Hats off to that.
Thanks @tleyden @adamcfraser since this is high priority bug, can this is part of 1.4?
@ddash1 we're working on it! This is a complex use-case that's currently not well supported by sg-replicate.
@adamcfraser I ran into a complication with dealing with _bulk_docs
errors. There's an existing sg-replicate test which asserts retry behavior on errors:
https://github.com/couchbaselabs/sg-replicate/blob/master/synctube_test.go#L590-L595
So it looks like some errors will need to be retried, while other errors will need to be treated as permanent and ignored.
So it looks like some errors will need to be retried, while other errors will need to be treated as permanent and ignored.
The plan is to focus on the _bulk_get
request / response rather than making any changes to the way _bulk_docs
works
I found an interesting twist to the _bulk_get
behavior:
If the request does not include a "rev" field, eg:
{
"docs":[
{
"id":"3d694a032fad0994a74691273478df63"
}
}
and if the latest revision of that doc has been removed from the channel, then the response will be something like:
--c5fb685db7b62ffe23837077047ab5ba68d195b672074d046a871e345c37
Content-Type: application/json; error="true"
{"error":"forbidden","id":"3d694a032fad0994a74691273478df63","reason":"forbidden","status":403}
--c5fb685db7b62ffe23837077047ab5ba68d195b672074d046a871e345c37--
OTOH, if you add the revision to the _bulk_get
request, eg:
{
"docs":[
{
"id": "3d694a032fad0994a74691273478df63",
"rev": "2-4bbb88692d20c37593326c766f205709"
}
]
}
Then the response will be as follows:
--7cce5f1db5cb7d8f377f21dc905493275c658c4691b02ed87c3e83a379f0
Content-Type: application/json
{"_id":"3d694a032fad0994a74691273478df63","_removed":true,"_rev":"2-4bbb88692d20c37593326c766f205709"}
--7cce5f1db5cb7d8f377f21dc905493275c658c4691b02ed87c3e83a379f0--
It's not relevant to this ticket as far as I know, since the requests from the replication code should always contain a rev
field, but I thought I'd note it anyway.
@jamiltz you might want to add some notes to the REST API docs if you don't already have some on this.
@adamcfraser this issue we have observer during * channel based based replication, is it coped in 1.4?
@ArihantRk If you're seeing this with a channel filter defined (but that channel filter is *), then it might be covered by the existing fix.
If you're seeing this with no channel filter defined, that would be a different issue.
Thanks @adamcfraser , currently we have a channel filter defined (but that channel filter is *. Fix should be available for this..
syncgateway 1.3.1
couchbase 4.1.1
Operating system : redhat 7.2
SETUP Details
Cluster1 2 syncgateway(1 : assigns channel1 and 2: assigns channel2) 2 syncgateway replicator(1 : replicates channel1 and 2: assigns channel2) 3 couchbase nodes Cluster2 2 syncgateway(1 : assigns channel3 and 2: assigns channel4) 2 syncgateway replicator(1 : replicates channel3 and 2: assigns channel4) 3 couchbase nodes
syncgateway and sg-replicate configs below https://gist.github.com/ArihantRk/40ad192427f9558451c858ccd2497b4c
Each of syncgateway differs by assigning channels, in above config syncgateway assigns channel "channel1" and sg-replicate replicates "channel1"
sg-replicates does only replicates between clusters. source : local target : fqdn(resolves to one of cluster2 sg-replicates)
Log output
we are using channel based replication in syncgateway replicator for scalability purpose
from clients each of requests hits one of syncgateway in random manner. hence revisions of same documents are assigning to different channels.
we are updating same documents within seconds and assigning random channels in each request as request are hitting different syncgateways Request: curl -vX GET 'http://localhost:4985/ptxdata/_changes?filter=sync_gateway/bychannel&channels=channels1' | grep 's91996660656_ptx1483342633728_meta.doc'
{"seq":1558,"id":"s91996660656_ptx1483342633728_meta.doc","removed":["channels1"],"changes":[{"rev":"4-54e6cabaaa63cb75c760e7d2606c7009"}]}
when we query for document for same revision, could see that document does not exists in couchbase Request : curl http://localhost:4985/ptxdata/s91996660656_ptx1483342633728_meta.doc?rev=4-54e6cabaaa63cb75c760e7d2606c7009 {"error":"not_found","reason":"missing"}
bulk get getting response as {"error":"not_found","reason":"missing"} for the same revision and same is sending to bulk post bulk_docs request: {"new_edits":false,"docs":[{"error":"not_found","id":"s91996660656_ptx1483342633728_meta.doc","reason":"missing","rev":"4-54e6cabaaa63cb75c760e7d2606c7009","status":404}]}
Expected behavior
syncgateway replicator should ignore the older revisions if not found in couchbase and replicates latest revision
Actual behavior
syncgateway replicator uses style=all_docs and limit=50 in _changes feed and could see all revisions of same documents are seeing changes feed. but older revisions are not exist in couchbase hence bulk_docs request is failing. due to this replication got stuck , no subsequent documents are not replication.
Steps to reproduce