Doloops / mcachefs

mcachefs : Simple filesystem-based file cache based on fuse
64 stars 15 forks source link

mcachefs freezing when `cat .mcachefs/journal` #21

Open hradec opened 5 years ago

hradec commented 5 years ago

after using it for a couple of days (without applying the journal), mcachefs just freezes when I try to cat the journal, with these being the last log lines (with DEBUG enabled on build):

LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:104:mcachefs_journal_read_entry|  Reading...
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:110:mcachefs_journal_read_entry|  Read : res=56
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:104:mcachefs_journal_read_entry|  Reading...
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:110:mcachefs_journal_read_entry|  Read : res=56
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:104:mcachefs_journal_read_entry|  Reading...
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:110:mcachefs_journal_read_entry|  Read : res=56
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:104:mcachefs_journal_read_entry|  Reading...
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:110:mcachefs_journal_read_entry|  Read : res=56
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:104:mcachefs_journal_read_entry|  Reading...
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:110:mcachefs_journal_read_entry|  Read : res=56
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:104:mcachefs_journal_read_entry|  Reading...
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:110:mcachefs_journal_read_entry|  Read : res=56
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:104:mcachefs_journal_read_entry|  Reading...
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:110:mcachefs_journal_read_entry|  Read : res=56
LOG|7f040e5a3700|190824:191329:538|mcachefs-vops.c:11:mcachefs_vops_cleanup_vops|VOPS CLEANUP for /.mcachefs/journal
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:104:mcachefs_journal_read_entry|  Reading...
LOG|7f040eda4700|190824:191329:538|mcachefs-journal.c:110:mcachefs_journal_read_entry|  Read : res=56

right after VOPS CLEANUP for /.mcachefs/journal, it spits out a couple of mcachefs-journal.c:110:mcachefs_journal_read_entry| Read : res=56 and then freezes.

not sure where to start debugging this one... any ideas?

Doloops commented 4 years ago

Hi @hradec,

It seems you somehow hit a bug I'm struggling to reproduce but still have from time to time.

The idea behind VOPS was to build some file contents in memory in response to an access from a file in .mcachefs/* (but I guess you already knew that).

But VOPS CLEANUP being called means the vops file is being released on thread 7f040e5a3700, while still being written on thread 7f040eda4700. This should not happen because of the vops use count protecting from release.

Do you have a step to reproduce ? This would be great. Having a close look at mcachefs-io.c and mcachefs-file.c would be my first step, especially the use count mfile->use incremented in mcachefs_fileid_get() and decremented in mcachefs_file_release() of mcache-file.c.

If you come with a steps to reproduce I'll happily check this !

hradec commented 4 years ago

I'll try to create a reproducible setup for you to test, but it may be hard since I got that error after about 2 days of using it without issues (I left it mounted for 2 full days without shutting off the machine, working on it about 12 hours each day - loading/writing python scripts and large files)

Also, I just merged your latest changes (it was a while since I last merged), and theses latest changes seem to have fixed some other issues I was having using mcachefs in a renderfarm setup I have, where one central machine mounts an offsite storage using sshfs with mcachefs on top, and servers it to about 50 other machines.

I was having weird issues, like software crashing for one specific user, but not all the others... and it wasn't crashing mcachefs or anything... it was crashing a software that was loaded from mcachefs... totally odd weird bug!

anyhow, after merging your latest changes, that problem seems to be fixed now!!! Thanks loads for that by the way!!! :1st_place_medal:

I'll try to "break" it again now and figure out how to reproduce the break, but it may be fixed as well (I didn't check the changes I merged, so not sure if you made any changes to VOPS as well..)

anyhow, a huge thanks for your latest changes again!!! It really made a huge difference for me man!!! really appreciate!!! If I could I would hugh you right now and pay you a bunch of pints!! :+1: :)