Closed js2702 closed 1 month ago
Hey @js2702. Thanks a lot for sharing your findings!
We have a load-testing/perf-analysis project on our roadmap, haven't quite got there just yet. Dealing with large amounts of data is definitely something that can be improved by using bulk operations and more compact subprotocol for data transfer between the client and the server and between Electric and PG.
Kinda on topic, would there be any difference between an user syncing 10K oplogs and 1K users syncing 10 oplogs? In terms of server performance.
In theory, there shouldn't be a difference. Electric fans-in all incoming client writes into a single stream that is then fed into PG via logical replication.
If you know any tool we could use to test a higher number of users it would be great to hear.
Could you share some details about your current toolset you're using to run those tests?
Right now we are using a script that uses a part of our application to mass import csv files.
To check the performance and network bandwidth we are using Cadvisor and Prometheus for analytics.
version: "3.8"
name: docker_metrics
services:
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
privileged: true
devices:
- "/dev/kmsg"
ports:
- 8080:8080
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- 9090:9090
command:
- --config.file=/etc/prometheus/prometheus.yml
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
depends_on:
- cadvisor
And the prometheus.yml config
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
static_configs:
- targets:
- cadvisor:8080
We are measuring outcoming bytes from the Electric container and incoming bytes into the Postgres container. Substracting one from the other to obtain an approximate number of what a hosting provider like GCP could charge for the egress.
Prometheus queries
increase(container_network_receive_bytes_total{name="postgres-1"}[30s])
increase(container_network_transmit_bytes_total{name="electric-1"}[30s])
@js2702 Thank you for those details!
👋 we've been working the last month on a rebuild of the Electric server over at a temporary repo https://github.com/electric-sql/electric-next/
You can read more about why we made the decision at https://next.electric-sql.com/about
We're really excited about all the new possibilities the new server brings and we hope you'll check it out soon and give us your feedback.
We're now moving the temporary repo back here. As part of that migration we're closing all the old issues and PRs. We really appreciate you taking the time to investigate and report this issue!
We are doing some tests with large quantities of data (10000-15000 new rows) on a foreign key related table (so compensations messages are sent). What we've encountered is that sometimes the Electric service will complain about the Postgres connection being closed. We've been progressively increasing the number of rows to test and when reaching 10K it may or may not fail. When it fails it's possible that it tries again and then it gets synced correctly. But if we keep increasing the number of rows it starts failing consistently.
The tests are on Electric server and client 0.6.4 and we ran them on a macOS machine (Docker) and on a Linux server. It happens on both of them.
Electric logs
Postgres logs
Extra
Kinda on topic, would there be any difference between an user syncing 10K oplogs and 1K users syncing 10 oplogs? In terms of server performance. If you know any tool we could use to test a higher number of users it would be great to hear.