brimdata / zync

Kafka connector to sync Zed lakes to and from Kafka topics
BSD 3-Clause "New" or "Revised" License
18 stars 3 forks source link

Increasing memory usage leading to killed zync process in container #95

Closed philrz closed 1 year ago

philrz commented 2 years ago

A community user reported the following with zync:

We have a problem with Zync in continuous, inside a Docker container. If we restrict memory of the container to 1 ou 2 Gb of RAM the zync process is "killed" every 4 minutes or less. Because we have a script that it restart zync, the processing continus But why the process is killed?

2022-09-21 13:14:22 :    commit 2F5STiJN9vOnzSgsUwq3LQPWc4l 35 records
2022-09-21 13:14:22 :    commit 2F5STu2wYhhQ2tiJHpSmw5HvUfe 39 records
2022-09-21 13:14:22 :    commit 2F5SThbItGB7tIlxw1sEqRkIYNH 80 records
2022-09-21 13:14:22 :    commit 2F5STiaKerza01d21Gj6lVxr42x 65 records
2022-09-21 13:14:23 :    commit 2F5STudsJBZStCj7EoNAVxIZvrE 31 records
2022-09-21 13:14:23 :    commit 2F5STpriucaFU3I0tAhNLB2rkuO 84 records
2022-09-21 13:14:26 :    commit 2F5SUJ8Tf6ayp24EKPe3hPuD0NZ 16 records
Killed
...
2022-09-21 13:17:58 :    commit 2F5Sv2CqRZvHW6uYhearNvVfokL 143 records
2022-09-21 13:17:58 :    commit 2F5Sv5uR1n1tUM3Y2Z5csaW7kbv 35 records
2022-09-21 13:17:58 :    commit 2F5Sv23VRXhjIeQIl7zGbnKe1fL 19 records
2022-09-21 13:17:58 :    commit 2F5Sv4pJ6Xu4SwOMDIWtoiDFmsP 16 records
Killed
...
2022-09-21 13:21:55 :    commit 2F5TOs5kpSLp7gmyyiIJqA21LYB 376 records0
2022-09-21 13:21:56 :    commit 2F5TOuGhSrKB63Y7tJHQ2Az6ukm 269 records
2022-09-21 13:21:56 :    commit 2F5TOzXIrM2dVHt7MWvAZyQJKfm 814 records
Killed

The command line being run is:

zync from-kafka -exitafter 60m /tmp/list20.yaml

...where there's 881 topics in /tmp/list20.yaml.

Then in a follow-on set of tests varying the -thresh parameter, they reported the following:

We redid tests in another environment with Zync and the "thresh" parameter. Server total memory 8Gb Zync container limit to 6Gb of RAM (docker environment) Test 1: 900 topics, thresh=5000 -> zync is killed after 30 mins Test 2: 900 topics, thresh=2000 -> zync is killed after 5 mins Test 3: 900 topics, thresh=100 -> zync is killed after 1 min At each test, the memory used by Zync was at 6Gb (top max) for some time, just before being killed. We also tested with the variable GOGC=50, no change. I think there is a "memory leak" in Zync's code, or it doesn't see the container limit is 6Gb and it thinks it's 8Gb. So, by exceeding 6Gb, the process gets killed.

nwt commented 1 year ago

User reports that a combination of -thresh 5000 and -topicmaxbytes 131072 keeps memory usage below 2 GB for over 800 topics.