Closed empeje closed 6 years ago
Hi @empeje
Thank you for your report. A few questions and comments:
Regarding the 3Gb files, are those sizes for single document bodies? Are those attachments? Individual or total? Usually storing large individual documents or attachments in that range is considered an anti-pattern for CouchDB. Is there any way to break those up into smaller documents?
Is it a single replication or are there others (many) running at the same time?
Noticed that max http request size and max client body size are set to 1Gb and CouchDB's max document size is set to 4Gb. Perhaps adjust the request size in both Nginx and CouchDb to a large value. For max request size, you'd want it greater than the sum of all the document's revisions and the total size of its attachments.
Regarding Experiment 1, with 2.1.2 release, should apply some of those max size configs, as opposed to using the defaults. In that release max request size was 64Mb so the default values won't work, especially if you have individual doc revisions + attachment sizes that exceed that.
Do you know if smaller sizes work, what are the limits that work (1Gb, 100Mb,...)?
Perhaps try replicating on a local network without SSL or Nginx as a debugging experiment...
Inspect the logs on source and target. Do you see 413 http errors, timeouts? Especially see if you can notice the part when errors start happening. A 413 error, either sent by CouchDB or Nginx, might indicate that some of the max size limits might be applied.
use_checkpoints = true
is the default for the replicator, no need to set it explicitly.
There is also http://docs.couchdb.org/en/stable/config/replicator.html#replicator/connection_timeout setting, maybe adjusting that up a bit might help.
Thanks for your answer @nickva.
So many things to address here.
Regarding the 3Gb files, are those sizes for single document bodies? Are those attachments? Individual or total? Usually storing large individual documents or attachments in that range is considered an anti-pattern for CouchDB. Is there any way to break those up into smaller documents?
It is mostly an attachment. We are distributing learning resource data via CouchDB. I know it is anti-pattern, but we take this risk because it is one of the simplest way to create federated system with low human resource in a social benefit organization like us. Our goal is at least able to replicate 1GB of attachment.
Is it a single replication or are there others (many) running at the same time?
Our server may receives multiple replication request at a time.
Noticed that max http request size and max client body size are set to 1Gb and CouchDB's max document size is set to 4Gb. Perhaps adjust the request size in both Nginx and CouchDb to a large value. For max request size, you'd want it greater than the sum of all the document's revisions and the total size of its attachments.
Thank you for catching this, but we also got a problem when we replicate data under 1 GB. What I say 2.7 GB here is basically a complete database consists of several record with its attachment.
Regarding Experiment 1, with 2.1.2 release, should apply some of those max size configs, as opposed to using the defaults. In that release max request size was 64Mb so the default values won't work, especially if you have individual doc revisions + attachment sizes that exceed that.
Thats why we led to Experiment 2, we still got a problem when we increase the max request size.
Do you know if smaller sizes work, what are the limits that work (1Gb, 100Mb,...)?
For sure we have success replicating 121 MB record with 120 MB portion in form of attachment.
Perhaps try replicating on a local network without SSL or Nginx as a debugging experiment...
We are on the way to debug this, will update the result later.
Inspect the logs on source and target. Do you see 413 http errors, timeouts? Especially see if you can notice the part when errors start happening. A 413 error, either sent by CouchDB or Nginx, might indicate that some of the max size limits might be applied.
We haven't notice the 413
error but good to know this, so I can debug further.
use_checkpoints = true is the default for the replicator, no need to set it explicitly.
Thanks, good to know this. We're little bit desperate with this result so we try various way hoping some of them work.
There is also http://docs.couchdb.org/en/stable/config/replicator.html#replicator/connection_timeout setting, maybe adjusting that up a bit might help.
Thanks
Checkpoints are usually not that expensive, just an update of a local document on target and source. Try not to delay it too much. I would reduce that to something closer to the default.
I tried, but wonder if checkpoint is also working with attachment?
Thanks for the support. We are currently experimenting with plain no SSL setup in local and with HAProxy, and also will try some of your suggestion. Hope some of them work and then I able to share here.
Also the new release (2.2) which is being finalized has some replication improvements relating to attachment uploads.
http://docs.couchdb.org/en/2.2.0/whatsnew/2.2.html#version-2-2-0
You could wait till the release is out or try building it during your testing, to see if improvements help.
@empeje fyi we do have people replicating behind nginx, or with native SSL, and finding it works OK for them. I suspect all of your problems are specifically related to attachment usage.
Have you had a chance to try the 2.2.0 RCs at all?
@wohali @nickva we just do a little experiment a few days ago with 2.1.2
with this config
[log]
writer = file
file = /opt/couchdb/var/log/couch.log
[chttpd]
bind_address = any
[httpd]
bind_address = any
enable_cors = true
max_http_request_size = 4294967296
[couchdb]
max_document_size = 4294967296
uuid = anuuid
[replicator]
socket_options = [{keepalive, true}, {nodelay, false}]
checkpoint_interval = 5000
use_checkpoints = true
[cors]
origins = *
credentials = true
methods = GET, PUT, POST, HEAD, DELETE
headers = accept, authorization, content-type, origin, referer, x-csrf-token
[couch_httpd_auth]
timeout = 1200
users_db_public = true
public_fields = name,firstName,middleName,lastName,roles,isUserAdmin,joinDate,email,phoneNumber,gender
secret = asecret
And we're able to replicate 1.5 GBs of attachment. I suspect there is some problem with my SSL setup.
Anyway, I'll try the 2.2.0 RCs also. Thanks @wohali
Closing as it sounds like it's not a CouchDB issue specifically - if you come up with something we can help with explicitly, let us know and we can re-open it.
Thanks @wohali
Long story short SSL s*cks with native SSL or Nginx frontend
Expected Behavior
2.1.2
Current Behavior
Possible Solution
Steps to Reproduce (for bugs)
Here is my setup
vi.yml
app.conf
Additional Data
Experiment 1
Experiment 2
Result
The left pattern is the
Experiment 1
, and the right is theExperiment 2
Disk usage
Network traffic
The
Experiment 1
try to sync or replicate the data and until 8 minutes the connection reset, you can see the pattern of disk usage and network traffic above. In the second experiment, we have a continuous connection between couchdb, but every 8 minutes we have some data deleted (seeing the disk usage)Context
I'm in a project called planet where we build a federated learning management system, similar to what mastodon did but for learning management. Our setup is a central (earth), and nation server deployed in Docker with
docker-compose
(our application is not that complex and no k8s necessary at least for now), and we have Raspberry Pis deployed in the field in some places where internet connection is still unstable (some small place in Madagascar, Ghana, Nepal, etc). In our central server, we use Lets Encrypt to add SSL encryption and we use Nginx as the reverse proxy. Problem is when we try to replicate a database from CouchDB in the Raspberry Pi, earth or nation server to earth or nation.At first, we think that Raspberry Pi was the problem, but turned out the behavior is replicable in the CouchDB deployed on the server too. For quick notes, we are using version
2.1.2
(you can check it here). Previously when using CouchDB 1.6 we never this kind of problem and it is using RPi tooYour Environment
2.1.2