Closed DmDiamond closed 8 years ago
ulimit -n 4096
Done. Looking after.
i2pd seems to be in order.
To forget about the problem I added lines to the file /etc/security/limits.conf
:
* hard nofile 4096
* soft nofile 4096
i2pd was down again) Increased nofile to 65536, looking after.
UPD The web console has down in 30 minutes, restart made. It seems that's not the constraints.
I'm working to reduce number of sockets
@DmDiamond, check your limits for opened connections. I fixed that by adding
net.core.somaxconn = 32768
net.ipv4.tcp_max_orphans = 65536
in /etc/sysctl.conf
Was:
> sudo sysctl -a | grep somaxconn
net.core.somaxconn = 128
sudo sysctl -a | grep orphans
net.ipv4.tcp_max_orphans = 4096
> ss -s
Total: 2974 (kernel 0)
TCP: 2966 (estab 1116, closed 1753, orphaned 39, synrecv 0, timewait 72/0), ports 0
@r4sas done, testing.
i2pd crashed again. It works 5 days. The server was rebooted because of the need and this run lasted a few hours, although adjustments.
Do you have a stack trace from core file?
No. Will work on it. But I forgot the important detail before. i2pd crashed at once after server restart (previous message). The strings in kern.log (i2pd started -> killed):
Jul 23 13:32:51 server kernel: [49519.993763] [ 3888] 1000 3888 562114 208136 1078 330226 0 i2pd
Jul 23 13:32:51 server kernel: [49519.993821] Out of memory: Kill process 3888 (i2pd) score 866 or sacrifice child
Jul 23 13:32:51 server kernel: [49519.993988] Killed process 3888 (i2pd) total-vm:2248456kB, anon-rss:832544kB, file-rss:0kB
Two fails in apport.log (at worktime):
apport.log.1:ERROR: apport (pid 3058) Sat Jul 23 00:09:21 2016: executable: /home/i2pd/build/i2pd ...
apport.log:ERROR: apport (pid 8649) Sun Jul 24 22:01:17 2016: executable: /home/i2pd/ ...
UPD Reboot again, msg after i2pd started, kernel.log:
kern.log:Jul 24 23:26:25 server kernel: [ 510.489010] i2pd[2436]: segfault at 9 ip b7289675 sp bfdda090 error 4 in libc-2.19.so[b7216000+1a8000]
So, it crashed at shutdown?
No. All crashes due system working ("Two fails in apport.log") or due the bit of system loading time (i2pd runned -> crashed).
The new crash. The core dump and the log file is, but 1.3 GB each. The last strings i2pd.log:
/.i2pd$ tail -n 10 i2pd.log
11:34:02@756/warn - NetDb: Requested xQcsc~Y-Ej9aYXwWF5X1LTdf5Uhhn8P5UO~EMNeF3jE= not found, 0 peers excluded
11:34:02@756/warn - NetDb: Requested zRRQBRiWOBAmOsP8fD8I-qq9EnvQ0Wz~SilRdiPhogs= not found, 0 peers excluded
11:34:02@305/info - Transports: RouterInfo for xpAAeuXa~pObhPENTfbuwAQF3vuMQqPrmSVSbR7zoUs= not found, requested
11:34:02@838/error - NTCP: Phase 4 read error: Connection reset by peer. Check your clock
11:34:03@838/error - NTCP: Phase 4 read error: End of file. Check your clock
11:34:03@838/error - NTCP: Phase 4 read error: End of file. Check your clock
11:34:03@838/error - NTCP: Phase 4 read error: End of file. Check your clock
11:34:03@838/error - NTCP: Phase 4 read error: End of file. Check your clock
11:34:03@756/warn - NetDb: Requested f9PTJImeqzjnE-hm-vfzOJuQ2S8Q5D0qCVYb6sR7lc4= not found, 0 peers excluded
11:34:03@838/info - NTCP: Phase 2 read error: End of file. Wrong ident assumed
The clock time is customized normally.
I need only stack trace from core file
i have this problem too but the daemon does not really crash. it just does nothing but service i2pd status still says active (running)
I'm little busy with other software projects. Stack trace as soon as possible - need smoke mans to the debugger.
@username-not-taken what version do you use?
@username-not-taken I have same problem, i2pd running out of file descriptors once in a few days. to count them do: wc -l /proc/<ip2d PID>/fd
. Decreasing transittunnels seems to help a little.
Faced the same problem on Fedora 20 running oldish hardware with --bandwidth=X setting. Compiled from latest trunk.
Actually not only Web interface stop responding, but also httpproxy fails to serve requests.
No doubt that file descriptors are running out, here is corresponding strace -ff
:
...
close(489) = 0
epoll_wait(27, {}, 128, 0) = 0
epoll_wait(27, {{EPOLLIN, {u32=165017800, u64=165017800}}}, 128, -1) = 1
accept(29, 0, NULL) = 272
epoll_ctl(27, EPOLL_CTL_ADD, 272, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=2954677240, u64=2954677240}}) = 0
ioctl(272, FIONBIO, [1]) = 0
recvmsg(272, {msg_name(0)=NULL, msg_iov(1)=[{"GET / HTTP/1.1\r\nUser-Agent: Mozi"..., 8192}], msg_controllen=0, msg_flags=0}, 0) = 430
epoll_ctl(27, EPOLL_CTL_MOD, 29, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=165017800, u64=165017800}}) = 0
epoll_wait(27, {}, 128, 0) = 0
clock_gettime(CLOCK_REALTIME, {1472635301, 700411440}) = 0
time(NULL) = 1472635301
sendmsg(272, {msg_name(0)=NULL, msg_iov(1)=[{"HTTP/1.1 200 OK\r\nContent-Length:"..., 3718}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 3718
recvmsg(272, 0xb32febe8, 0) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(27, {}, 128, 0) = 0
shutdown(272, SHUT_RDWR) = 0
epoll_ctl(27, EPOLL_CTL_DEL, 272, b32feddc) = 0
close(272) = 0
epoll_wait(27, {}, 128, 0) = 0
epoll_wait(27, {{EPOLLIN, {u32=165017800, u64=165017800}}}, 128, -1) = 1
accept(29, 0, NULL) = -1 EMFILE (Too many open files)
epoll_wait(27, {{EPOLLIN, {u32=165017800, u64=165017800}}}, 128, -1) = 1
epoll_wait(27, {{EPOLLIN, {u32=165017736, u64=165017736}}}, 128, -1) = 1
futex(0x9d5f7e0, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x9d5f7e0, FUTEX_WAKE_PRIVATE, 1) = 0
madvise(0xb2aff000, 8372224, MADV_DONTNEED) = 0
_exit(0) = ?
+++ exited with 0 +++
I ran into the same problem in gentoo long ago. We (gentoo) changed the default number of file descriptors to 4096. Recently I had the same issue again, when running the floodfill. Had to increase it again.
What limit would you recommend?
build one from trunk, it should be less greedy
Is it possible to implement some kind of protection when available file descriptors are low to enable the application still respond to WEB and HTTPProxy requests instead of silently fail?
The problem is there is no simple way to find out if we are about to reach this cap.
@uaply If you run i2pd via systemd service, then I simply added to i2pd.service:
# Restart every two days until the too-many-descriptors problem is fixed
WatchdogSec=172800
Restart=on-abnormal
Maybe just add to the systemd service file
LimitNOFILE=65536
?
(this changes ulimit -n
to 65536 for the service)
@khumarahn Sure. In my case, I just don't want 65536 open descriptors on my tiny ARM board that's already overloaded.
I see a bit over 6000 open descriptors running floodfill. So 65536 is an exaggeration of course.
This is on 2.9.0 release. I did not try the trunk, will probably wait for the next release.
@khumarahn curious, how many of those are network connections vs other file descriptors?
I am not sure how to check that. I check the number of file descriptors with
# ls -al /proc/6464/fd | wc -l
6049
upd:
lsof -i -a -p 6464
prints out 880 lines
running a build that uses fewer timers with ssu
ls -lah /proc/$(pidof i2pd)/fd | wc -l
594
will let you know how it goes
Any plans to incorporate these changes into main branch? Max file descriptors problem still poses limitation, 4096 of them are running out pretty fast indeed...
the current tweaks to ssu that relieve the file descriptor usage in my current branch is sub optimal right now because the implementation is quadratic complexity with respects to the number of ssu sessions.
working on it still.
ls -lah /proc/$(pidof i2pd)/fd | wc -l 695
we current code in the main branch 2 weeks uptime
Set up i2pd 2.8.0 on Ubuntu server 14.04.4 tomorrow (15.07.16). Monitor software detects a failure of web console on 7070 port in 1-2 hours after i2pd start. i2pd daemon can works or can crash.