couchbase / sync_gateway

Manages access and synchronization between Couchbase Lite and Couchbase Server
https://www.couchbase.com/products/sync-gateway
Other
447 stars 138 forks source link

Problems with replicating between buckets (sg-replicate) #2884

Closed nbrys closed 7 years ago

nbrys commented 7 years ago

Sync Gateway version

We are currently running a 1.5 master build: http://latestbuilds.hq.couchbase.com/couchbase-sync-gateway/1.5.0/1.5.0-553/ We do this in order to work around #2587 and #2607

Operating system

Ubuntu 14.04

Config file

Regular config file with 8 databases and the following memory tweaks:

          "feed_type": "TAP",
          "rev_cache_size": 10,
          "cache": {
            "channel_cache_max_length": 10,
            "channel_cache_min_length": 10
        }

Problem

The sync starts fine but quits after a couple of minutes without a clear error message.

Command: curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{ "source": "bucketa", "target": "http://localhost:4985/bucketb" }' 'http://localhost:4985/_replicate'

We get the following message on the console: {"error":"Internal Server Error","reason":"Internal error: Replication Aborted"}

The sync gateway log: https://pastebin.com/AwnLMZVJ

adamcfraser commented 7 years ago

Thanks for the report. Unfortunately, it looks like the relevant log messages were stripped from the log by rsyslog, so the log doesn't include the diagnostic information that would usually be associated with the 'Replication Aborted' message. Are you able to reproduce the issue to get full logs?

nbrys commented 7 years ago
Sep  7 14:52:20 mobile-sta-syncgateway-i-0c96f9e1369ba9653 sync-gateway-stdout: 2017-09-07T14:52:20.644Z #011BulkDocs: Doc "com.id.dependency:charts_1" --> 400 Missing data of attachment "charts.js" (400 Missing data of attachment "charts.js")
Sep  7 14:52:20 mobile-sta-syncgateway-i-0c96f9e1369ba9653 sync-gateway-stderr: 2017-09-07T14:52:20.645Z #011BulkDocs: Doc "com.id.dependency:d3_1" --> 400 Missing data of attachment "d3.js" (400 Missing data of attachment "d3.js")
Sep  7 14:52:20 mobile-sta-syncgateway-i-0c96f9e1369ba9653 sync-gateway-stdout: 2017-09-07T14:52:20.649Z #011BulkDocs: Doc "com.id.dependency:d3_2" --> 400 Missing data of attachment "d3.js" (400 Missing data of attachment "d3.js")
Sep  7 14:52:20 mobile-sta-syncgateway-i-0c96f9e1369ba9653 sync-gateway-stdout: 2017-09-07T14:52:20.650Z #011BulkDocs: Doc "com.id.dependency:pen-signature_1" --> 400 Missing data of attachment "pen-signature.js" (400 Missing data of attachment "pen-signature.js")
Sep  7 14:52:20 mobile-sta-syncgateway-i-0c96f9e1369ba9653 sync-gateway-stdout: 2017-09-07T14:52:20.650Z HTTP:  #1578472: PUT /mensura/com.id.dependency:id_2?new_edits=false  (ADMIN)
Sep  7 14:52:20 mobile-sta-syncgateway-i-0c96f9e1369ba9653 sync-gateway-stdout: 2017-09-07T14:52:20.652Z HTTP: #1578472:     --> 400 Too few MIME parts: expected 4 attachments, got 3  (2.1 ms)
Sep  7 14:52:20 mobile-sta-syncgateway-i-0c96f9e1369ba9653 sync-gateway-stderr: 2017-09-07T14:52:20.653Z HTTP: #1567824:     --> 500 Internal error: Replication Aborted  (120058.2 ms)
tleyden commented 7 years ago

@nbrys do you have the Replicate log key enabled in your sync gateway config? Or * logging would also cover it.

If not, can you enable it and re-run it and paste the logs snippet?

tleyden commented 7 years ago

Oops, n/m, looks like Replicate logging is already enabled.

Assuming you are on OSX or Linux, is it possible to get a network trace using ngrep -W byline port 4985? If it contains any sensitive data, you can email it to me (my email is on my github profile) rather than putting it in a pastebin.

Another tool that would also work is tcpdump or wireshark

tleyden commented 7 years ago

@nbrys also another thing that might help a lot is the full logs. (or at least a bigger log snippet)

With that, the problem might be apparent even without the network capture.

Another thing that might also help debug is to to use two different sync gateway instances, and then have separate logs for each, and the messages from the source and target will be clearly differentiated.

tleyden commented 7 years ago

If it's being aborted due to PUSH_ATTACHMENT_DOCS_FAILED, there should be some additional Replicate logging, from these lines of code:

https://github.com/couchbaselabs/sg-replicate/blob/master/synctube.go#L511 https://github.com/couchbaselabs/sg-replicate/blob/master/synctube.go#L497

tleyden commented 7 years ago

@nbrys got your email, thanks! Which logs are enabled in your sync gateway config? I didn't see any Replicate logs, and would expect them according to https://github.com/couchbase/sync_gateway/issues/2884#issuecomment-327951443

nbrys commented 7 years ago

Hi.... i'm afraid I only enabled logs on 1 server. I will start the replication again with logs enabled on all servers.

Outlook voor Androidhttps://aka.ms/ghei36 downloaden


From: Traun Leyden notifications@github.com Sent: Friday, September 8, 2017 8:50:11 PM To: couchbase/sync_gateway Cc: Nico Brys; Mention Subject: Re: [couchbase/sync_gateway] Problems with replicating between buckets (sg-replicate) (#2884)

@nbryshttps://github.com/nbrys got your email, thanks! Which logs are enabled in your sync gateway config? I didn't see any Replicate logs, and would expect them according to #2884 (comment)https://github.com/couchbase/sync_gateway/issues/2884#issuecomment-327951443

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/couchbase/sync_gateway/issues/2884#issuecomment-328185135, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKDsB5FPDdLp9jx4fn3GgONfn4n5ih0Lks5sgYxjgaJpZM4PP6GU.

tleyden commented 7 years ago

@nbrys ok thanks! Any chance you can hop into https://gitter.im/couchbase/discuss?

tleyden commented 7 years ago

Once https://github.com/couchbase/sync_gateway/issues/2875 is merged, the logs should contain more details.

tleyden commented 7 years ago

@nbrys even with Replicate logging enabled, until https://github.com/couchbase/sync_gateway/issues/2896 is addressed, it might be fairly difficult to diagnose this.

It would be helpful if you could capture network traffic between the two sync gateways using one of these tools:

ajres commented 7 years ago

The "400 Missing data of attachment" errors occur when PUTing a new document revision that contains an _attachments dict.

If an attachment does not contain inline "data" it is expected to have a "stub" property, which indicated that the attachment data was stored in a previous parent revision. If the "stub" property is missing then the "400 Missing data of attachment" error is thrown, to indicate invalid attachment metadata.

ajres commented 7 years ago

The "400 Too few MIME parts: expected M attachments, got N" error occurs when reading a miltipart document in attachment.go ReadMultipartDocument(). For each entry in a documents _attachments dict that has a "follows" parameter, it assumed there will be a separate part containing the contents of the attachment. If the number of parts is less than the expected attachments to follow, the error is thrown.

Reviewing attachment.go WriteMultipartDocument(), there does not appear to be a code path that would result in a different number of attachments with the "follows" property and the number of mime parts.