Doloops / mcachefs

mcachefs : Simple filesystem-based file cache based on fuse
64 stars 15 forks source link

flush metadata while file operating cause bus error #5

Closed cde-jq closed 5 years ago

cde-jq commented 6 years ago

do like this 1、create 300 file in /mountpoint/dir/ 2、 open this last created file 3、 flush metadata 4、 close file。

Program received signal SIGBUS, Bus error. [Switching to Thread 0x7f897bc10700 (LWP 110004)] mcachefs_metadata_clean_fh_locked (id=) at mcachefs-metadata.c:1312 1312 mdata->fh = 0; (gdb) bt

0 mcachefs_metadata_clean_fh_locked (id=) at mcachefs-metadata.c:1312

1 0x000000000040a83e in mcachefs_file_remove (mfile=0x70c8e0) at mcachefs-file.c:177

2 mcachefs_file_release (mfile=0x70c8e0) at mcachefs-file.c:224

3 mcachefs_fileid_put (fdi=7391456) at mcachefs-file.c:247

4 0x0000000000412bdb in mcachefs_release_mfile (mfile=0x70c8e0, info=0x7f897bc0fcf0) at mcachefs-io.c:400

5 0x00007f897f1f7d12 in fuse_do_release () from /lib64/libfuse.so.2

6 0x00007f897f1fa603 in fuse_lib_release () from /lib64/libfuse.so.2

7 0x00007f897f200fb4 in do_release () from /lib64/libfuse.so.2

8 0x00007f897f201bdb in fuse_ll_process_buf () from /lib64/libfuse.so.2

9 0x00007f897f1fe471 in fuse_do_work () from /lib64/libfuse.so.2

10 0x00007f897f637dc5 in start_thread () from /lib64/libpthread.so.0

11 0x00007f897ef2087d in clone () from /lib64/libc.so.6

hradec commented 5 years ago

I was able to reproduce this, by using a simple python one liner:

python -c "import os;n=[n for n in range(242) if not open('/tmp/2/%s' % n,'w').write('aaaaa\n') ];f=open('/tmp/2/%s' % n[-1]);os.system('echo flush_metadata > /tmp/2/.mcachefs/action');f.close()"

running mcachefs with:

gdb --args mcachefs -f -o -s /tmp/1 /tmp/2

it seems with 241 files the bug doesn't shows up... but with 242 and up it does!

I'll look for some relation of this 241/242 in the metadata code... but if this rings any bells for you, let me known!

hradec commented 5 years ago

actually: image

the id=256 when trying to close file number 242!! interesting...

hradec commented 5 years ago

It seems the metadata initially holds 255 entries. So when you go over it, it allocates a new entry.

But after a flush, it comes back down to 255... so if you have an open file before the flush (which was entry 256 or bigger), and try to close it after the flush, the entry doesn't exist in the metadata anymore.

And I think that's why you got the error, trying to access an memory area which was deallocated.

Not sure the best way to fix since I don't understand the whole logic yet... But I'm tending to just add some checking to ignore it (since it happens when closing a file)... maybe a try/catch?!

any comments is greatly appreciated! ;)

Doloops commented 5 years ago

Thanks @cde-jq for opening this issue, and thanks @hradec for your very nice test case !

mcachefs now stops crashing, and because we won't flush metadata if journal is not applied, files may not be lost in attic.

Still, refreshing local cache (and thus, flushing metadata) is still a work in progress.