LittleFlower2019 / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

Out of Memory process killed #314

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
im newbie of s3fs

I run s3fs on amazon ec2 t1.micro

s3fs run normaly first,

but eventually i got this message on dmesg

how large memory s3fs needs?

[10678098.190971] Call Trace:
[10678098.191059]  [<ffffffff810bd5fd>] ? 
cpuset_print_task_mems_allowed+0x9d/0xb0
[10678098.191064]  [<ffffffff81117ce1>] dump_header+0x91/0xe0
[10678098.191067]  [<ffffffff81118065>] oom_kill_process+0x85/0xb0
[10678098.191070]  [<ffffffff8111840a>] out_of_memory+0xfa/0x220
[10678098.191074]  [<ffffffff8111de3a>] __alloc_pages_nodemask+0x7ea/0x800
[10678098.191081]  [<ffffffff81154b73>] alloc_pages_current+0xa3/0x110
[10678098.191086]  [<ffffffff81114a4f>] __page_cache_alloc+0x8f/0xa0
[10678098.191089]  [<ffffffff81114d4e>] ? find_get_page+0x1e/0x90
[10678098.191092]  [<ffffffff81116bd2>] filemap_fault+0x212/0x3c0
[10678098.191096]  [<ffffffff81136e62>] __do_fault+0x72/0x550
[10678098.191109]  [<ffffffff8113a70a>] handle_pte_fault+0xfa/0x200
[10678098.191115]  [<ffffffff8100648e>] ? xen_pmd_val+0xe/0x10
[10678098.191118]  [<ffffffff810052e9>] ? 
__raw_callee_save_xen_pmd_val+0x11/0x1e
[10678098.191121]  [<ffffffff8113abc8>] handle_mm_fault+0x1f8/0x350
[10678098.191125]  [<ffffffff8103cda5>] ? pvclock_clocksource_read+0x55/0xf0
[10678098.191131]  [<ffffffff8165625b>] do_page_fault+0x14b/0x520
[10678098.191137]  [<ffffffff8109300d>] ? ktime_get_ts+0xad/0xe0
[10678098.191145]  [<ffffffff811876ad>] ? poll_select_copy_remaining+0xed/0x140
[10678098.191148]  [<ffffffff8118896d>] ? sys_select+0xcd/0x100
[10678098.191151]  [<ffffffff81652eb5>] page_fault+0x25/0x30
[10678098.191153] Mem-Info:
[10678098.191156] Node 0 DMA per-cpu:
[10678098.191158] CPU    0: hi:    0, btch:   1 usd:   0
[10678098.191160] Node 0 DMA32 per-cpu:
[10678098.191162] CPU    0: hi:  186, btch:  31 usd:  45
[10678098.191166] active_anon:139286 inactive_anon:28 isolated_anon:0
[10678098.191167]  active_file:72 inactive_file:351 isolated_file:0
[10678098.191168]  unevictable:0 dirty:0 writeback:0 unstable:0
[10678098.191169]  free:1368 slab_reclaimable:1835 slab_unreclaimable:2383
[10678098.191170]  mapped:79 shmem:62 pagetables:1787 bounce:0
[10678098.191172] Node 0 DMA free:2460kB min:72kB low:88kB high:108kB 
active_anon:9928kB inactive_anon:0kB active_file:196kB inactive_file:456kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14524kB 
mlocked:0kB dirty:0kB writeback:0kB mapped:208kB shmem:0kB 
slab_reclaimable:988kB slab_unreclaimable:256kB kernel_stack:168kB 
pagetables:160kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1107 
all_unreclaimable? yes
[10678098.191181] lowmem_reserve[]: 0 597 597 597
[10678098.191184] Node 0 DMA32 free:3012kB min:3088kB low:3860kB high:4632kB 
active_anon:547216kB inactive_anon:112kB active_file:92kB inactive_file:948kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:611856kB 
mlocked:0kB dirty:0kB writeback:0kB mapped:108kB shmem:248kB 
slab_reclaimable:6352kB slab_unreclaimable:9276kB kernel_stack:704kB 
pagetables:6988kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1181 
all_unreclaimable? yes
[10678098.191193] lowmem_reserve[]: 0 0 0 0
[10678098.191196] Node 0 DMA: 157*4kB 13*8kB 14*16kB 7*32kB 2*64kB 1*128kB 
0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2460kB
[10678098.191206] Node 0 DMA32: 753*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3012kB
[10678098.191214] 486 total pagecache pages
[10678098.191216] 0 pages in swap cache
[10678098.191217] Swap cache stats: add 0, delete 0, find 0/0
[10678098.191219] Free swap  = 0kB
[10678098.191220] Total swap = 0kB
[10678098.199723] 159472 pages RAM
[10678098.199727] 8376 pages reserved
[10678098.199728] 5929 pages shared
[10678098.199729] 148147 pages non-shared
[10678098.199731] [ pid ]   uid  tgid total_vm      rss cpu oom_adj 
oom_score_adj name
[10678098.199742] [  247]     0   247     4308       47   0       0             
0 upstart-udev-br
[10678098.199745] [  251]     0   251     5377      120   0     -17         
-1000 udevd
[10678098.199749] [  297]     0   297     5366       99   0     -17         
-1000 udevd
[10678098.199752] [  298]     0   298     5366       97   0     -17         
-1000 udevd
[10678098.199755] [  394]     0   394     3797       47   0       0             
0 upstart-socket-
[10678098.199759] [  405]     0   405     1816      124   0       0             
0 dhclient3
[10678098.199762] [  619]     0   619    12489      156   0     -17         
-1000 sshd
[10678098.199765] [  631]   102   631     5979       77   0       0             
0 dbus-daemon
[10678098.199769] [  633]   101   633    63526      753   0       0             
0 rsyslogd
[10678098.199772] [  692]     0   692     3626       41   0       0             
0 getty
[10678098.199775] [  698]     0   698     3626       43   0       0             
0 getty
[10678098.199779] [  703]     0   703     3626       41   0       0             
0 getty
[10678098.199782] [  704]     0   704     3626       41   0       0             
0 getty
[10678098.199785] [  706]     0   706     3626       42   0       0             
0 getty
[10678098.199789] [  712]     0   712     1082       36   0       0             
0 acpid
[10678098.199792] [  714]     0   714     4227       40   0       0             
0 atd
[10678098.199795] [  715]     0   715     4778       59   0       0             
0 cron
[10678098.199798] [  724]     0   724    19198      272   0       0             
0 nginx
[10678098.199801] [  725]    33   725    19349      428   0       0             
0 nginx
[10678098.199805] [  726]    33   726    19347      430   0       0             
0 nginx
[10678098.199808] [  727]   103   727    46897      299   0       0             
0 whoopsie
[10678098.199811] [  732]    33   732    19351      447   0       0             
0 nginx
[10678098.199814] [  734]    33   734    19348      439   0       0             
0 nginx
[10678098.199818] [  748]     0   748    61490      940   0       0             
0 php5-fpm
[10678098.199821] [  749]    33   749    62556     1510   0       0             
0 php5-fpm
[10678098.199824] [  750]    33   750    62684     1650   0       0             
0 php5-fpm
[10678098.199827] [  751]    33   751    62684     1651   0       0             
0 php5-fpm
[10678098.199830] [  752]    33   752    62556     1522   0       0             
0 php5-fpm
[10678098.199834] [  782]     0   782     3626       41   0       0             
0 getty
[10678098.199837] [  971]    33   971    62428     1478   0       0             
0 php5-fpm
[10678098.199841] [ 1166]     0  1166     3143       42   0       0             
0 rsync
[10678098.199844] [30324]    33 30324    62851     1835   0       0             
0 php5-fpm
[10678098.199847] [15013]     0 15013     4800       78   0       0             
0 rpcbind
[10678098.199850] [15247]   106 15247     5376      115   0       0             
0 rpc.statd
[10678098.199854] [15318]     0 15318     7445       65   0       0             
0 rpc.idmapd
[10678098.199860] [15807]     0 15807     7047      164   0       0             
0 rpc.mountd
[10678098.199863] [28746]     0 28746   287290   123730   0       0             
0 s3fs
[10678098.199867] [ 8468]     0  8468    18340      201   0       0             
0 sshd
[10678098.199870] [ 8549]  1000  8549    18340      200   0       0             
0 sshd
[10678098.199873] [ 8550]  1000  8550     6241     1450   0       0             
0 bash
[10678098.199876] [ 9103]  1000  9103    13384      626   0       0             
0 vi
[10678098.199879] [19380]     0 19380    18340      195   0       0             
0 sshd
[10678098.199882] [19382]     0 19382     1100       25   0       0             
0 sh
[10678098.199886] [19383]     0 19383     1075       27   0       0             
0 run-parts
[10678098.199891] [19391]     0 19391     1100       26   0       0             
0 50-landscape-sy
[10678098.199895] [19398]     0 19398    17263     2667   0       0             
0 landscape-sysin
[10678098.199898] [19400]     0 19400    12495      150   0       0             
0 sshd
[10678098.199901] [19401]     0 19401    12489      151   0     -17         
-1000 sshd
[10678098.199905] Out of memory: Kill process 28746 (s3fs) score 791 or 
sacrifice child

Original issue reported on code.google.com by xtru...@gmail.com on 11 Jan 2013 at 4:02

GoogleCodeExporter commented 8 years ago
s3fs has a memory leak.  Please see Issue 278.

Original comment by nicho...@ifactorinc.com on 11 Jan 2013 at 7:46

GoogleCodeExporter commented 8 years ago
Hi,

Latest s3fs(now v1.67) is fixed some memory leak.
Please try to check it.
I closed this issue so I arrange opened past Issue.

If you find more memory leak, please let me know and re-post new issue with 
more information.

Regards,

Original comment by ggta...@gmail.com on 15 Apr 2013 at 7:19

GoogleCodeExporter commented 8 years ago
Hi

This issue is re-opened because s3fs has memory leak yet.
I reapeared memory leak with using HTTPS. (but I could not reapear it without 
https(http))

So that, Issue 191, Issue 278,  Issue 343 are included this issue.
Please let me know more information for fixing memory leak.

Thank you for your help.

Original comment by ggta...@gmail.com on 10 Jun 2013 at 4:24

GoogleCodeExporter commented 8 years ago
I checked this issue, but I could not solve this problem fundamentally.
(I found some small malfunction, but could not solve this.)

However, all who have same issue are not the same conditions, but I found one 
of cause.
When s3fs connects with HTTPS and the libcurl with NSS, s3fs leaks memory in 
curl with nss.
(This case is already posted Issue 191 by huwtlewis)
Maybe this case is occurred when libcurl is under 7.21.5 version.
** see)  http://curl.haxx.se/changes.html

I try to run s3fs with libcurl 7.30.0(nss, no openssl), then it seems work good.

And I checked libcurl(openssl, no nss), it seems s3fs has no problem while 
running.

I continue to look into a problem, and try to fix this.
But anyone who has this problem, please let me know about your libcurl version 
and nss/openssl.
And if you can, please try to run another version or openssl instead of nss.

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 13 Jun 2013 at 6:40

GoogleCodeExporter commented 8 years ago
Hi, all

I checked about curl version and libnss/openssl.

At first, if s3fs run with libcurl linked libnss(not OpenSSL), maybe you need 
to use libcurl version 7.21.5 after.
If you use libcurl(with libnss) under version it, s3fs has many memory leak.
But you use libcurl(with libnss) after 7.21.5, s3fs leaks memory only 40 
bytes(maybe this is for loading nss module…)
Next, if you use libcurl with OpenSSL, s3fs does not leak memory.(I checked 
version 7.19.7)

* Summary for s3fs and libcurl ( libnss, openssl ):
   I tested it by the following combinations.
    curl 7.19.7 + libnss 3.13.1.0       --> memory leak(many)
    curl 7.19.7 + openssl 1.0.0     --> no memory leak
    curl 7.30.0 + libnss 3.14.0.0       --> no memory leak(but only 40 bytes for loading libs)
    curl 7.30.0 + openssl 1.0.0     --> no memory leak

So that, I updated FuseOverAmazon wiki for this problem.
http://code.google.com/p/s3fs/wiki/FuseOverAmazon

Someone who has this problem(memory leak), please check your libcurl version 
and linking library(libnss or openssl).
And please let me know your opinion about this reply.

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 15 Jun 2013 at 4:08

GoogleCodeExporter commented 8 years ago
Hello guys,

I don't know what my reply will be worth for, but I am also getting a pretty 
serious memory leak.
The S3FS process does grow by 4K after every operation.
As I read your comments I understood that some of my libs might be too old, and 
after investigation, I must confess it is the case.

The OS I am currently using is a CentOS 6.4, and I really begin to hate that 
distro, since all packages are more than 3 years old, when the distro is from 
... 2013.

So of course at first s3fs would not compile because of the too old FUSE. I 
recompiled it.
Then I checked about the Curl library, which was also too old, and I recompiled 
it.
Apparently the only security lib that is installed on my system is openss, did 
not find any trace of libnss.

Anyway I'm still getting that issue, and I don't really know where it comes 
from. Might be libidn or zlib, which are compiled against libcurl. The matter 
is that I'm supposed to use the OS... not recompile it, otherwise I could use a 
LFS.

Anyway, S3FS seems to know about the problem since I'm getting that libraries 
too old message.
fuse: warning: library too old, some operations may not not work

Or maybe I should understand that it is not S3FS which is shouting but FUSE ?

Is my assumption correct ?
In any case that makes CentOS an unreliable distro for S3FS. :-/

I'll check with FUSE and come back to you.

Original comment by olivier....@gmail.com on 9 Jul 2013 at 9:30

GoogleCodeExporter commented 8 years ago
Hello again,

I can't really say a lot more.
Just realized that there was also a kernel module for FUSE, and this is not 
embedded in the lib.
So I am wondering if the complaint about the "library too old" does not come 
from the interaction with the kernel module. The fact is that Kernel is 2.6.32, 
but I am not able to get the revision of it. I can only say that it is a 
"recent" kernel for CentOS.

Modinfo is the following :
[root@ip-10-154-193-232 fuse-2.9.3]# modinfo fuse
filename:       /lib/modules/2.6.32-358.2.1.el6.x86_64/kernel/fs/fuse/fuse.ko
alias:          char-major-10-229
license:        GPL
description:    Filesystem in Userspace
author:         Miklos Szeredi <miklos@szeredi.hu>
srcversion:     0957DD49586EC513678776E
depends:
vermagic:       2.6.32-358.2.1.el6.x86_64 SMP mod_unload modversions
parm:           max_user_bgreq:Global limit for the maximum number of 
backgrounded requests an unprivileged user can set (uint)
parm:           max_user_congthresh:Global limit for the maximum congestion 
threshold an unprivileged user can set (uint)

At this point, I am a bit stuck, but my previous statement is, TMHO, still 
valid. I'm supposed to use to OS, and not recompile from scratch, or I'll go 
god knows where.

Would you like to have details please let me know. Fact is that also on a 
t1.micro EC2 machines from Amazon.

Best regards,
---
Olivier

Original comment by olivier....@gmail.com on 9 Jul 2013 at 9:48

GoogleCodeExporter commented 8 years ago
Hello, Maquaire

I'm sorry for that I don't understand all of this problem, but I checked s3fs 
with libraries by valgrind(tool).
It said that memory leaks is in libcurl->libnss.

Then I think this problem depends on only libnss(and libcurl), not OS and 
drivers.
(My module is older than your: 2.6.32-71.29.1.el6.x86_64)
I did not try to use old FUSE, but maybe s3fs could work almost functions with 
old FUSE….
(Probably, some functions are not worked.)

For EC2, could you make custom image with another libraries?
(I'm sorry for I do not know detail about EC2.)

Thanks for your assistance.

Original comment by ggta...@gmail.com on 10 Jul 2013 at 7:07

GoogleCodeExporter commented 8 years ago
S3FS 1.71 compiled against libcurl-7.27.0 and I have a nice slow, steady 
memleak march toward OOM. Re-mounting s3fs causes a nice dump and restart of 
the march. Graphic attached. Happy to help with any debugging.

root     21572  0.6 92.1 2382384 1563116 ?     Ssl  Aug02  28:39 s3fs 

Original comment by matthew....@spatialkey.com on 5 Aug 2013 at 1:03

Attachments:

GoogleCodeExporter commented 8 years ago
s3fs-1.71 compiled against libcurl-7.27.0 and nss-3.14.0 leaks.
s3fs-1.71 compiled against libcurl-7.31.0 and nss-3.14.3 leaks.

These leaks only happen when using SSL to connect to S3. Attached a graph. Same 
load, but since switching to non-HTTPS, no leak of note. I know you know it's a 
libcurl/nss issue, just thought I'd throw out some more information.

Original comment by matthew....@spatialkey.com on 8 Aug 2013 at 4:58

Attachments:

GoogleCodeExporter commented 8 years ago
Hi,

I'm sorry for replying late.

I checked codes about this problem, but I could not find a certain reason and 
solution.
However, I changed some codes about initializing curl and openssl, it updated 
as r479.
So the memory leak by s3fs  depends on the environment(libcurl+openssl/libnss, 
os?), I cannot mention that is a certain cause and is fixed.

If you can, please compile it and test for r479.

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 27 Aug 2013 at 8:23

GoogleCodeExporter commented 8 years ago
I built the latest from SVN and have it running with the mount using SSL on a 
low-priority cluster member. I'll let you know how it looks. Thanks!

Original comment by matthew....@spatialkey.com on 27 Aug 2013 at 7:05

GoogleCodeExporter commented 8 years ago
Hi,

We feced with the similar situation. Looks like we have memory leak or other 
strange behavior for non SSL usage. We have been facing with this for the last 
four days.
Steps to reproduce:
1. Mount S3 bucket with huge amount of files (1 million) and start using 
mounted folder (only cp commands). Everything works fine all day long.

#s3fs -o allow_other,uid=500,gid=500 <S3 bucket> /mnt/s3

2. Stop using the mounted folder for 16 hours. (Go home and sleep)
3. Start using the mounted folder (Return to the office)
4. s3fs hang and started to occupy all free memory and processor. (see attached 
picture). We don't have any messages in syslog and in messages. s3fs just hang.

System information:
# cat /etc/*release*
CentOS release 6.3 (Final)
CentOS release 6.3 (Final)
CentOS release 6.3 (Final)
cpe:/o:centos:linux:6:GA

# s3fs --version
1.72 - revision r469

#curl --version
curl 7.32.0 (x86_64-redhat-linux-gnu) libcurl/7.32.0 OpenSSL/1.0.0 zlib/1.2.3 
c-ares/1.10.0 libidn/1.18 libssh2/1.4.3

Version of fuse being used
2.8.4

Version of nss being used
3.14.3

We've installed the latest version of s3fs (r481) and will observe the behavior.

Original comment by Yury_bal...@pubget.com on 30 Aug 2013 at 12:59

Attachments:

GoogleCodeExporter commented 8 years ago
So the SSL leak still exists, although it is *much* slower than it used to be 
(compare to previous graphs) The attached graph shows the same mount under the 
same load with SSL (slow jog to death) and then remounted without SSL (no leak).

Original comment by matthew....@spatialkey.com on 3 Sep 2013 at 12:45

Attachments:

GoogleCodeExporter commented 8 years ago
Hi, Yury_baltrushevich

I'm sorry for replying too late.
I had to take long time for checking and changing s3fs leaking.

I fixed some codes for memory leak.
It probably does not have memory leak with no-SSL.
But the memory leak is not fixed completely.(with SSL(NSS)).

Please check after r483.
And please take care, this revision adds "nosscache" option and 
"--enable-nss-init" configure option, changes default parallel count for head 
request(500->20).

Last, I have a question for your environment.
Do you have 1 million object in ONE directory object?
I think it is hard to list objects, and s3fs needs huge memory.

Original comment by ggta...@gmail.com on 14 Sep 2013 at 10:03

GoogleCodeExporter commented 8 years ago
Hi, matthew

(I'm sorry for replying too late.
I had to take long time for checking and changing s3fs leaking.)

I fixed some codes for memory leak.
It probably does not have memory leak with no-SSL.
But the memory leak is not fixed completely.(with SSL(NSS)).

Please check after r483.
And please take care, this revision adds "nosscache" option and 
"--enable-nss-init" configure option, changes default parallel count for head 
request(500->20).

Last, I have a question for your environment.
Do you have 1 million object in ONE directory object?
I think it is hard to list objects, and s3fs needs huge memory.

Original comment by ggta...@gmail.com on 14 Sep 2013 at 10:05

GoogleCodeExporter commented 8 years ago
Hi, matthew

Sorry for my mis-comment.

Revision 483(482), this revision calls initializing functions for libxml2 and 
NSS.
You can run autogen.sh and do configure with "--enable-nss-init", then these 
functions are called in main function.
If your machine does not have nss-devel package, please install it.

This revision could not fix memory leak completely, but some leaks is fixed.

Please try to check it.

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 15 Sep 2013 at 1:09

GoogleCodeExporter commented 8 years ago
Hi, matthew

Maybe, I can not fix this issue unless libnss+libcurl is not version up.
I rechecked s3fs codes and memory leaks.
I think that this problem is caused by the fact that memory leaks in curl(with 
nss).
I found same issue in curl mailing list:
http://curl.haxx.se/mail/lib-2013-08/0175.html
Mailing list cause is seem  same as this issue.

In addition, I tested about mallopt environments.
I set "MALLOC_MMAP_MAX_=0" because the area can not allocate mmap area, and 
other environment(MALLOC_TRIM_THRESHOLD_=0, MALLOC_TOP_PAD_=0) is set too.
In this way, s3fs's VIRT(top command) memory could be kept low size.
Thereafter, I run s3fs with "max_stat_cache_size=0" which means that s3fs uses 
no stat cache.
But after sending many head request(ex. ls command lists many file), the size 
of RES memory(and VIRT) was increased by this operation.
So, I think that this means leaking memory at libnss+libcurl.( in other case, 
the libssl+libcurl is not increased memory)

r482 fixed some codes for memory leaks, and added codes for initializing NSS.
But it is not enough solution for this problem, the cause is deep in lib curl 
and libnss.
I think it is hard to fix this issue and takes a lot of times.

Do you think about this?
If you have another idea or reason, please let me know.

Original comment by ggta...@gmail.com on 19 Sep 2013 at 7:24

GoogleCodeExporter commented 8 years ago
Hi, matthew

I'm sorry for replying this issue very slow.

I could know about s3fs memory leaking about maybe this case.
On my case, the libcurl version is under 7.21.4, this version has a bug about 
memory leaking.
(For example, cent-os uses yum repository, it does not have latest 
curl(libcurl) version.)
The bug is only a case of built with NSS library.
You can see "nss: avoid memory leaks and failure of NSS shutdown" in 7.21.4 
release notes.

I tried to update latest libcurl with NSS, and test with s3fs.
After that, it worked good for me, it seems memory leaking does not increase.

*** NOTES
We moved s3fs repo from Googlecodes to 
github(https://github.com/s3fs-fuse/s3fs-fuse).
And I updated master branch today, which supports more two SSL library for s3fs.
If you use libcurl with NSS, you should build s3fs with NSS library.
Please try to build s3fs with NSS.

If you can, please upgrade libcurl and s3fs(on Github), and try to check memory 
leaking.

Thanks in advance for your help.

Original comment by ggta...@gmail.com on 1 Jun 2014 at 3:15

GoogleCodeExporter commented 8 years ago
Hi,

I have one box configured as below:

* S3FS master (1.77+)
* fuse 2.9.3
* libcurl 7.36.0
* nss 3.16.0

I will let it run for a couple days (barring problems) and update you. Thanks 
so much for continuing to work on this problem!!

Original comment by matthew....@spatialkey.com on 2 Jun 2014 at 1:26