Closed greyivy closed 2 weeks ago
I cannot reproduce this. Can you determine which daemons (or from which containers) are generating this DNS traffic?
It seems to be any-sync-consensusnode
(or -- it stops when stopping the consensus node. Stopping any other container doesn't seem to make a difference).
Interestingly, restarting the container alone doesn't seem to fix the problem. I have to run make restart
.
Can you show me the logs of mongo-1 and any-sync-consensusnode at the moment of the problem?
docker-compose logs mongo-1
docker-compose logs any-sync-consensusnode
and also ouput this commands:
docker compose exec any-sync-consensusnode getent hosts mongo-1
Sorry for the delay on this. It turns out that the MongoDB instance is actually shutting down and not restarting. I'm not sure what's sending it the signal to shut down though. Here's the log:
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:29:04.284+00:00\"},\"s\":\"I\", \"c\":\"STORAGE\", \"id\":22430, \"ctx\":\"WTCheckpointThread\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1717759744:284833][1:0x7f3ae4d23700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 19186, snapshot max: 19186 snapshot count: 0, oldest timestamp: (1717759729, 1) , meta checkpoint timestamp: (1717759734, 1) base write gen: 124169\"}}\n","stream":"stdout","time":"2024-06-07T11:29:04.285002854Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:29:56.863+00:00\"},\"s\":\"I\", \"c\":\"NETWORK\", \"id\":22943, \"ctx\":\"listener\",\"msg\":\"Connection accepted\",\"attr\":{\"remote\":\"127.0.0.1:42548\",\"connectionId\":331,\"connectionCount\":13}}\n","stream":"stdout","time":"2024-06-07T11:29:56.863331392Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:29:56.863+00:00\"},\"s\":\"I\", \"c\":\"NETWORK\", \"id\":51800, \"ctx\":\"conn331\",\"msg\":\"client metadata\",\"attr\":{\"remote\":\"127.0.0.1:42548\",\"client\":\"conn331\",\"doc\":{\"application\":{\"name\":\"MongoDB Shell\"},\"driver\":{\"name\":\"MongoDB Internal Client\",\"version\":\"4.4.29\"},\"os\":{\"type\":\"Linux\",\"name\":\"Ubuntu\",\"architecture\":\"x86_64\",\"version\":\"20.04\"}}}}\n","stream":"stdout","time":"2024-06-07T11:29:56.863883533Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:29:56.869+00:00\"},\"s\":\"I\", \"c\":\"REPL\", \"id\":21356, \"ctx\":\"conn331\",\"msg\":\"replSetInitiate admin command received from client\"}\n","stream":"stdout","time":"2024-06-07T11:29:56.869886194Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:29:56.873+00:00\"},\"s\":\"I\", \"c\":\"NETWORK\", \"id\":22944, \"ctx\":\"conn331\",\"msg\":\"Connection ended\",\"attr\":{\"remote\":\"127.0.0.1:42548\",\"connectionId\":331,\"connectionCount\":12}}\n","stream":"stdout","time":"2024-06-07T11:29:56.873223696Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:04.327+00:00\"},\"s\":\"I\", \"c\":\"STORAGE\", \"id\":22430, \"ctx\":\"WTCheckpointThread\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1717759804:327809][1:0x7f3ae4d23700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 19201, snapshot max: 19201 snapshot count: 0, oldest timestamp: (1717759789, 1) , meta checkpoint timestamp: (1717759794, 1) base write gen: 124169\"}}\n","stream":"stdout","time":"2024-06-07T11:30:04.327977934Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.633+00:00\"},\"s\":\"I\", \"c\":\"CONTROL\", \"id\":23377, \"ctx\":\"SignalHandler\",\"msg\":\"Received signal\",\"attr\":{\"signal\":15,\"error\":\"Terminated\"}}\n","stream":"stdout","time":"2024-06-07T11:30:06.63750487Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.636+00:00\"},\"s\":\"I\", \"c\":\"CONTROL\", \"id\":23378, \"ctx\":\"SignalHandler\",\"msg\":\"Signal was sent by kill(2)\",\"attr\":{\"pid\":0,\"uid\":0}}\n","stream":"stdout","time":"2024-06-07T11:30:06.637554041Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.640+00:00\"},\"s\":\"I\", \"c\":\"CONTROL\", \"id\":23381, \"ctx\":\"SignalHandler\",\"msg\":\"will terminate after current cmd ends\"}\n","stream":"stdout","time":"2024-06-07T11:30:06.643749122Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.643+00:00\"},\"s\":\"I\", \"c\":\"REPL\", \"id\":4784900, \"ctx\":\"SignalHandler\",\"msg\":\"Stepping down the ReplicationCoordinator for shutdown\",\"attr\":{\"waitTimeMillis\":10000}}\n","stream":"stdout","time":"2024-06-07T11:30:06.644203248Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.644+00:00\"},\"s\":\"I\", \"c\":\"COMMAND\", \"id\":4784901, \"ctx\":\"SignalHandler\",\"msg\":\"Shutting down the MirrorMaestro\"}\n","stream":"stdout","time":"2024-06-07T11:30:06.645004389Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.644+00:00\"},\"s\":\"I\", \"c\":\"REPL\", \"id\":40441, \"ctx\":\"SignalHandler\",\"msg\":\"Stopping TopologyVersionObserver\"}\n","stream":"stdout","time":"2024-06-07T11:30:06.645024635Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.670+00:00\"},\"s\":\"I\", \"c\":\"REPL\", \"id\":40447, \"ctx\":\"TopologyVersionObserver\",\"msg\":\"Stopped TopologyVersionObserver\"}\n","stream":"stdout","time":"2024-06-07T11:30:06.671023172Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.670+00:00\"},\"s\":\"I\", \"c\":\"SHARDING\", \"id\":4784902, \"ctx\":\"SignalHandler\",\"msg\":\"Shutting down the WaitForMajorityService\"}\n","stream":"stdout","time":"2024-06-07T11:30:06.671063454Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.684+00:00\"},\"s\":\"I\", \"c\":\"CONTROL\", \"id\":4784903, \"ctx\":\"SignalHandler\",\"msg\":\"Shutting down the LogicalSessionCache\"}\n","stream":"stdout","time":"2024-06-07T11:30:06.685776668Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.691+00:00\"},\"s\":\"I\", \"c\":\"NETWORK\", \"id\":20562, \"ctx\":\"SignalHandler\",\"msg\":\"Shutdown: going to close listening sockets\"}\n","stream":"stdout","time":"2024-06-07T11:30:06.691264825Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.692+00:00\"},\"s\":\"I\", \"c\":\"NETWORK\", \"id\":23017, \"ctx\":\"listener\",\"msg\":\"removing socket file\",\"attr\":{\"path\":\"/tmp/mongodb-27001.sock\"}}\n","stream":"stdout","time":"2024-06-07T11:30:06.692398477Z"}
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.695+00:00\"},\"s\":\"I\", \"c\":\"NETWORK\", \"id\":4784905, \"ctx\":\"SignalHandler\",\"msg\":\"Shutting down the global connection pool\"}\n","stream":"stdout","time":"2024-06-07T11:30:06.695686185Z"}
...shutdown messages continue
The relevant line is:
{"log":"{\"t\":{\"$date\":\"2024-06-07T11:30:06.633+00:00\"},\"s\":\"I\", \"c\":\"CONTROL\", \"id\":23377, \"ctx\":\"SignalHandler\",\"msg\":\"Received signal\",\"attr\":{\"signal\":15,\"error\":\"Terminated\"}}\n","stream":"stdout","time":"2024-06-07T11:30:06.63750487Z"}
This is a fresh Docker install and AnyType is the only thing running. None of the other containers are killed.
could you please collect the debug information?
cat /proc/cpuinfo
free -m
dmesg -T | grep -i oom
docker inspect <mongo_container_id>
to find the ID of the stopped container, you can use the following command:
docker ps -a | grep mongo-1
It was indeed an OOM, and totally unrelated. Thank you for looking into it.
@greyivy I would be very grateful if you could share the parameters of your system. At the moment, we are investigating the issue of "minimum requirements" for cluster operation, and this data will be very helpful to us.
Sure thing!
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 156
model name : Intel(R) Celeron(R) N5095 @ 2.00GHz
stepping : 0
microcode : 0x24000024
cpu MHz : 1996.800
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 27
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand hypervisor lahf_lm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms rdseed smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves arat umip waitpkg gfni rdpid movdiri movdir64b md_clear arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid shadow_vmcs pml tsc_scaling
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs srbds mmio_stale_data rfds bhi
bogomips : 3993.60
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
total used free shared buff/cache available
Mem: 1962 515 108 17 1539 1447
Swap: 0 0 0
(Note that I tried with 3GB and 4GB as well with no change, but I was running other memory-intensive processes on the machine at the time. I'll give it another shot again in the future when I can dedicate more time to troubleshooting.)
Have you read a contributing guide?
Current Behavior
After the app runs for a while, it starts making millions of DNS requests to my main DNS server and the logs become filled with the following. My AnyType client still appears to be fully functional.
Expected Behavior
The above error does not occur -- or if it does -- it does not generate millions of DNS requests.
Steps To Reproduce
docker-compose.override.yml
:Note: I changed the interval in hopes that it would help solve this issue but it did not help.
Environment
Anything else?
No response