WEB interface does't open in 1-2 hours after i2pd start (ubuntu)

DmDiamond commented 8 years ago

Set up i2pd 2.8.0 on Ubuntu server 14.04.4 tomorrow (15.07.16). Monitor software detects a failure of web console on 7070 port in 1-2 hours after i2pd start. i2pd daemon can works or can crash.

orignal commented 8 years ago

ulimit -n 4096

DmDiamond commented 8 years ago

Done. Looking after.

DmDiamond commented 8 years ago

i2pd seems to be in order. To forget about the problem I added lines to the file /etc/security/limits.conf:

* hard nofile 4096
* soft nofile 4096

DmDiamond commented 8 years ago

i2pd was down again) Increased nofile to 65536, looking after.

UPD The web console has down in 30 minutes, restart made. It seems that's not the constraints.

orignal commented 8 years ago

I'm working to reduce number of sockets

r4sas commented 8 years ago

@DmDiamond, check your limits for opened connections. I fixed that by adding

net.core.somaxconn = 32768
net.ipv4.tcp_max_orphans = 65536

in /etc/sysctl.conf

DmDiamond commented 8 years ago

Was:

> sudo sysctl -a | grep somaxconn
net.core.somaxconn = 128

sudo sysctl -a | grep orphans
net.ipv4.tcp_max_orphans = 4096

> ss -s
Total: 2974 (kernel 0)
TCP:   2966 (estab 1116, closed 1753, orphaned 39, synrecv 0, timewait 72/0), ports 0

@r4sas done, testing.

DmDiamond commented 8 years ago

i2pd crashed again. It works 5 days. The server was rebooted because of the need and this run lasted a few hours, although adjustments.

orignal commented 8 years ago

Do you have a stack trace from core file?

DmDiamond commented 8 years ago

No. Will work on it. But I forgot the important detail before. i2pd crashed at once after server restart (previous message). The strings in kern.log (i2pd started -> killed):

Jul 23 13:32:51 server kernel: [49519.993763] [ 3888]  1000  3888   562114   208136    1078   330226             0 i2pd
Jul 23 13:32:51 server kernel: [49519.993821] Out of memory: Kill process 3888 (i2pd) score 866 or sacrifice child
Jul 23 13:32:51 server kernel: [49519.993988] Killed process 3888 (i2pd) total-vm:2248456kB, anon-rss:832544kB, file-rss:0kB

Two fails in apport.log (at worktime):

apport.log.1:ERROR: apport (pid 3058) Sat Jul 23 00:09:21 2016: executable: /home/i2pd/build/i2pd ...
apport.log:ERROR: apport (pid 8649) Sun Jul 24 22:01:17 2016: executable: /home/i2pd/ ...

UPD Reboot again, msg after i2pd started, kernel.log:

kern.log:Jul 24 23:26:25 server kernel: [  510.489010] i2pd[2436]: segfault at 9 ip b7289675 sp bfdda090 error 4 in libc-2.19.so[b7216000+1a8000]

orignal commented 8 years ago

So, it crashed at shutdown?

DmDiamond commented 8 years ago

No. All crashes due system working ("Two fails in apport.log") or due the bit of system loading time (i2pd runned -> crashed).

DmDiamond commented 8 years ago

The new crash. The core dump and the log file is, but 1.3 GB each. The last strings i2pd.log:

/.i2pd$ tail -n 10 i2pd.log
11:34:02@756/warn - NetDb: Requested xQcsc~Y-Ej9aYXwWF5X1LTdf5Uhhn8P5UO~EMNeF3jE= not found, 0 peers excluded
11:34:02@756/warn - NetDb: Requested zRRQBRiWOBAmOsP8fD8I-qq9EnvQ0Wz~SilRdiPhogs= not found, 0 peers excluded
11:34:02@305/info - Transports: RouterInfo for xpAAeuXa~pObhPENTfbuwAQF3vuMQqPrmSVSbR7zoUs= not found, requested
11:34:02@838/error - NTCP: Phase 4 read error: Connection reset by peer. Check your clock
11:34:03@838/error - NTCP: Phase 4 read error: End of file. Check your clock
11:34:03@838/error - NTCP: Phase 4 read error: End of file. Check your clock
11:34:03@838/error - NTCP: Phase 4 read error: End of file. Check your clock
11:34:03@838/error - NTCP: Phase 4 read error: End of file. Check your clock
11:34:03@756/warn - NetDb: Requested f9PTJImeqzjnE-hm-vfzOJuQ2S8Q5D0qCVYb6sR7lc4= not found, 0 peers excluded
11:34:03@838/info - NTCP: Phase 2 read error: End of file. Wrong ident assumed

The clock time is customized normally.

orignal commented 8 years ago

I need only stack trace from core file

username-not-taken commented 8 years ago

i have this problem too but the daemon does not really crash. it just does nothing but service i2pd status still says active (running)

DmDiamond commented 8 years ago

I'm little busy with other software projects. Stack trace as soon as possible - need smoke mans to the debugger.

orignal commented 8 years ago

@username-not-taken what version do you use?

radfish commented 8 years ago

@username-not-taken I have same problem, i2pd running out of file descriptors once in a few days. to count them do: wc -l /proc/<ip2d PID>/fd. Decreasing transittunnels seems to help a little.

uaply commented 8 years ago

Faced the same problem on Fedora 20 running oldish hardware with --bandwidth=X setting. Compiled from latest trunk.

Actually not only Web interface stop responding, but also httpproxy fails to serve requests.

No doubt that file descriptors are running out, here is corresponding strace -ff:

...
close(489)                              = 0
epoll_wait(27, {}, 128, 0)              = 0
epoll_wait(27, {{EPOLLIN, {u32=165017800, u64=165017800}}}, 128, -1) = 1
accept(29, 0, NULL)                     = 272
epoll_ctl(27, EPOLL_CTL_ADD, 272, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=2954677240, u64=2954677240}}) = 0
ioctl(272, FIONBIO, [1])                = 0
recvmsg(272, {msg_name(0)=NULL, msg_iov(1)=[{"GET / HTTP/1.1\r\nUser-Agent: Mozi"..., 8192}], msg_controllen=0, msg_flags=0}, 0) = 430
epoll_ctl(27, EPOLL_CTL_MOD, 29, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=165017800, u64=165017800}}) = 0
epoll_wait(27, {}, 128, 0)              = 0
clock_gettime(CLOCK_REALTIME, {1472635301, 700411440}) = 0
time(NULL)                              = 1472635301
sendmsg(272, {msg_name(0)=NULL, msg_iov(1)=[{"HTTP/1.1 200 OK\r\nContent-Length:"..., 3718}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 3718
recvmsg(272, 0xb32febe8, 0)             = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(27, {}, 128, 0)              = 0
shutdown(272, SHUT_RDWR)                = 0
epoll_ctl(27, EPOLL_CTL_DEL, 272, b32feddc) = 0
close(272)                              = 0
epoll_wait(27, {}, 128, 0)              = 0
epoll_wait(27, {{EPOLLIN, {u32=165017800, u64=165017800}}}, 128, -1) = 1
accept(29, 0, NULL)                     = -1 EMFILE (Too many open files)
epoll_wait(27, {{EPOLLIN, {u32=165017800, u64=165017800}}}, 128, -1) = 1
epoll_wait(27, {{EPOLLIN, {u32=165017736, u64=165017736}}}, 128, -1) = 1
futex(0x9d5f7e0, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x9d5f7e0, FUTEX_WAKE_PRIVATE, 1) = 0
madvise(0xb2aff000, 8372224, MADV_DONTNEED) = 0
_exit(0)                                = ?
+++ exited with 0 +++

khumarahn commented 8 years ago

I ran into the same problem in gentoo long ago. We (gentoo) changed the default number of file descriptors to 4096. Recently I had the same issue again, when running the floodfill. Had to increase it again.

What limit would you recommend?

orignal commented 8 years ago

build one from trunk, it should be less greedy

uaply commented 8 years ago

Is it possible to implement some kind of protection when available file descriptors are low to enable the application still respond to WEB and HTTPProxy requests instead of silently fail?

orignal commented 8 years ago

The problem is there is no simple way to find out if we are about to reach this cap.

radfish commented 8 years ago

@uaply If you run i2pd via systemd service, then I simply added to i2pd.service:

# Restart every two days until the too-many-descriptors problem is fixed
WatchdogSec=172800
Restart=on-abnormal

khumarahn commented 8 years ago

Maybe just add to the systemd service file

LimitNOFILE=65536

? (this changes ulimit -n to 65536 for the service)

radfish commented 8 years ago

@khumarahn Sure. In my case, I just don't want 65536 open descriptors on my tiny ARM board that's already overloaded.

khumarahn commented 8 years ago

I see a bit over 6000 open descriptors running floodfill. So 65536 is an exaggeration of course.

This is on 2.9.0 release. I did not try the trunk, will probably wait for the next release.

majestrate commented 8 years ago

@khumarahn curious, how many of those are network connections vs other file descriptors?

khumarahn commented 8 years ago

I am not sure how to check that. I check the number of file descriptors with

# ls -al /proc/6464/fd | wc -l
6049

khumarahn commented 8 years ago

upd:

lsof -i -a -p 6464

prints out 880 lines

majestrate commented 8 years ago

running a build that uses fewer timers with ssu

ls -lah /proc/$(pidof i2pd)/fd | wc -l
594

will let you know how it goes

uaply commented 8 years ago

Any plans to incorporate these changes into main branch? Max file descriptors problem still poses limitation, 4096 of them are running out pretty fast indeed...

majestrate commented 8 years ago

the current tweaks to ssu that relieve the file descriptor usage in my current branch is sub optimal right now because the implementation is quadratic complexity with respects to the number of ssu sessions.

working on it still.

orignal commented 8 years ago

ls -lah /proc/$(pidof i2pd)/fd | wc -l 695

we current code in the main branch 2 weeks uptime

PurpleI2P / i2pd

WEB interface does't open in 1-2 hours after i2pd start (ubuntu) #575