Closed pgerstbach closed 6 months ago
Okay, so far, I see the following:
docker logs -f nostream-cache
1:C 29 Dec 2023 10:20:16.148 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 29 Dec 2023 10:20:16.148 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 29 Dec 2023 10:20:16.148 # Configuration loaded
1:M 29 Dec 2023 10:20:16.156 # Server initialized
1:C 29 Dec 2023 10:48:02.237 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 29 Dec 2023 10:48:02.237 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 29 Dec 2023 10:48:02.237 # Configuration loaded
1:M 29 Dec 2023 10:48:02.244 # Server initialized
1:C 29 Dec 2023 12:15:14.797 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 29 Dec 2023 12:15:14.797 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 29 Dec 2023 12:15:14.797 # Configuration loaded
1:M 29 Dec 2023 12:15:14.800 # Server initialized
Redis seems to restart now and then because of a out-of-memory error:
Dec 29 10:32:36 at kernel: [62794.438636] out_of_memory+0x106/0x2e0
Dec 29 10:32:36 at kernel: [62794.438930] Tasks state (memory values in pages):
Dec 29 10:32:36 at kernel: [62794.439403] Out of memory: Killed process 1142634 (redis-server) total-vm:27478268kB, anon-rss:11599084kB, file-rss:0kB, shmem-rss:0kB, UID:999 pgtables:47208kB oom_score_adj:0
Dec 29 10:32:40 at start[1117079]: nostream | uncaught error: [ErrorReply: LOADING Redis is loading the dataset in memory]
Dec 29 10:32:41 at start[1117079]: nostream | web-socket-adapter: unable to handle message: [ErrorReply: LOADING Redis is loading the dataset in memory]
Dec 29 10:32:42 at start[1117079]: nostream | web-socket-adauncaught error: [ErrorReply: LOADING Redis is loading the dataset in memory]
Dec 29 10:32:42 at start[1117079]: nostream | at TCP.uncaught error: [ErrorReply: LOADING Redis is loading the dataset in memory]
Dec 29 10:32:42 at start[1117079]: nostream | at TCP.onStreamRead (node:internal/stream_base_commons:web-socket-adapter: unable to handle message: [ErrorReply: LOADING Redis is loading the dataset in memory]
Dec 29 10:32:42 at start[1117079]: nostream | web-socket-adapter: unable to handle message: [ErrorReply: LOADING Redis is loading the dataset in memory]
Memory is fine, most of the time, this is a typical value:
total used free shared buff/cache available
Mem: 15989 1283 13958 154 747 14264
Swap: 16383 797 15586
But sometimes it looks like this:
total used free shared buff/cache available
Mem: 15989 15499 159 174 330 68
Swap: 16383 2054 14329
This is the output of top, sorted by memory shortly after the above output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1263844 lxd 20 0 18.7g 12.6g 1064 R 95.7 80.5 5:08.09 redis-server
1252985 xxx 20 0 1466820 1.1g 3924 S 13.0 6.8 13:21.33 node
1279048 lxd 20 0 4416636 138436 126904 S 0.0 0.8 0:01.27 postgres
So it seems redis is sometimes allocating so much memory that the process is killed. How can I limit the memory use? Redis has a max-memory setting, but how do I set it in the nostream docker environment. What this actually help?
@pgerstbach We had a conversation off Github where we addressed the OOM issue with Redis as well as a connection issue due to NGINX. Do you mind updating here what was the solution?
@pgerstbach We had a conversation off Github where we addressed the OOM issue with Redis as well as a connection issue due to NGINX. Do you mind updating here what was the solution?
Certainly, I just waited long enough to see the issue gone!
So, here is a documentation how my issue was resolved. This would not be possible without @Cameri, so thanks a lot for your help!
In my syslog I found multiple out of memory errors like this:
out_of_memory+0x106/0x2e0
Tasks state (memory values in pages):
Out of memory: Killed process 640637 (redis-server) total-vm:8235260kB, anon-rss:7209804kB, file-rss:0kB, shmem-rss:0kB, UID:999 pgtables:14252kB oom_score_adj:0
So redis reserved more memory than available, even an additional 16GB swap file was not enough in some cases.
The solution of this memory problem is to reduce "rate limiting" in settings.yaml
. The redis cache is currently used only to implement those rate limits. There are limits on invoices, connections, events and messages possible (and enabled by default).
The longer the period used, the larger the memory consumption. I still had some ratelimits with periods of days in my config. After removing them the memory consumption was down to a normal level. If needed also ratelimits with periods of hour(s) should be removed.
After removing the ratelimits with longest period my out-of-memory erros where gone. But still I had outages, so I found one more configuration issue:
I realized that my nginx instance was not configered properly. It was running out of file and connection limits. In my nginx.conf I added the following lines:
worker_rlimit_nofile 8192;
events {
worker_connections 4096;
}
From the nginx docs: https://nginx.org/en/docs/ngx_core_module.html
Of course the values need to be aligned to your server resources and requirements: YMMV
So my issue was not really related to nostream, but maybe you have similar issues and hopefuly my solution will also help others! 🤙
Describe the bug Every now and then my relay comes into a state where every request to the payment website runs into a permanent timeout. After some time (minutes, hours) it is available again until it produces timeouts again.
During those timeout-times the relay itself is still available via websockets. Events can be stored and read. Only the payment website is affected.
To Reproduce I can't reproduce it with any specific steps, but it happens daily. After a restart it works again, until the timout happens again.
Expected behavior Payment webpage stays available.
System
Logs There is no specific log, but the following lines appear very often, but not always in proximity in time of the failure:
(the last 5 lines are repeated 100s of times within 1 second)
Additional context Since I have a monitoring in place, I can tell at what time (within 5mins) the site is not available anymore. Ist there any specifiy debuggin setting I can switch on?