astrada / google-drive-ocamlfuse

FUSE filesystem over Google Drive
https://astrada.github.io/google-drive-ocamlfuse/
MIT License
5.58k stars 353 forks source link

Mount randomly drops #281

Closed jrarseneau closed 5 years ago

jrarseneau commented 7 years ago

Hello,

I've been using google-drive-ocamlfuse for a few weeks now and every few days or so my mount will randomly drop. I have a gdfuse.log and curl.log file (the curl file is 17GB unfortunately).

Here's some info about my environment:

OS: Ubuntu 16.04 Kernel: 4.8.0-41-generic google-drive-ocamlfuse version: 0.6.17 Installed from PPA

Mount options: -cc -debug -o allow_other,ro,atomic_o_trunc,large_read,default_permissions,nonempty

Config file:

apps_script_format=json
apps_script_icon=
async_upload=true
cache_directory=
client_id=REDACTED
client_secret=REDACTED
connect_timeout_ms=5000
curl_debug_off=false
data_directory=
delete_forever_in_trash_folder=false
docs_file_extension=true
document_format=odt
document_icon=
download_docs=false
drawing_format=png
drawing_icon=
form_format=zip
form_icon=
fusion_table_format=desktop
fusion_table_icon=
keep_duplicates=false
large_file_read_only=true
large_file_threshold_mb=16
log_directory=
lost_and_found=false
low_speed_limit=0
low_speed_time=0
map_format=desktop
map_icon=
max_cache_size_mb=512
max_download_speed=0
max_memory_cache_size=10737418240
max_retries=8
max_upload_chunk_size=1099511627776
max_upload_speed=0
memory_buffer_size=8388608
metadata_cache_time=60
presentation_format=pdf
presentation_icon=
read_ahead_buffers=5
read_only=true
shared_with_me=false
spreadsheet_format=ods
spreadsheet_icon=
sqlite3_busy_timeout=5000
stream_large_files=true
umask=0o002
verification_code=

Here's the last few lines of the gdfuse.log file:

[8269.628471] TID=908128: END: Loading resource /Redacted/path/to/file (trashed=false) from db: Found (id=3768, state=ToDownload)
[8269.628490] TID=908128: BEGIN: Getting metadata from context
[8269.628499] TID=908128: END: Getting metadata: Valid
[8269.628505] TID=908128: BEGIN: Loading resource /Redacted/path/to/file (trashed=false) from db
[8269.629081] TID=908129: read /Redacted/path/to/file [4096 bytes] 90501120 0
[8269.629207] TID=908129: BEGIN: Getting metadata from context
[8269.629212] TID=908129: END: Getting metadata: Valid
[8269.629216] TID=908129: BEGIN: Loading resource /Redacted/path/to/file (trashed=false) from db
[8269.629299] TID=908128: END: Loading resource /Redacted/path/to/file (trashed=false) from db: Found (id=3768, state=ToDownload)
[8269.629572] TID=908129: END: Loading resource /Redacted/path/to/file (trashed=false) from db: Found (id=3768, state=ToDownload)
[8269.629589] TID=908129: BEGIN: Getting metadata from context
[8269.629594] TID=908129: END: Getting metadata: Valid
[8269.629597] TID=908129: BEGIN: Loading resource /Redacted/path/to/file (trashed=false) from db
[8269.629786] TID=908129: END: Loading resource /Redacted/path/to/file (trashed=false) from db: Found (id=3768, state=ToDownload)
[8269.629807] TID=908129: Buffer pool free buffers: 0
[8269.629945] TID=908129: Memory cache size: 10521754909
[8269.629950] TID=908129: Releasing memory buffer (remote id=0B7BUKwy7Xqr0QkFzRXhXc0doY3c, index=207, buffer id=785)
[8269.629955] TID=908129: BEGIN: Acquiring memory buffer (remote id=0B7BUKwy7Xqr0amtidlNvUktZNWs, index=10)
[8269.629959] TID=908129: END: Acquiring memory buffer (remote id=0B7BUKwy7Xqr0amtidlNvUktZNWs, index=10, size=6616808, buffer id=785)
[8269.629965] TID=908129: BEGIN: Stream resource (id=3768, offset=83886080, finish=90502887, length=6616808)

Any idea why this is happening? Anything else I can provide to help troubleshoot?

Thanks!

ne0ark commented 7 years ago

You really don't need large_read with 16.04. You can use auto_cache.

jrarseneau commented 7 years ago

I should add that my primary use is for Plex, so I'm streaming multi-gigabyte files regularly. Plex also has an intensive media scanner that index/analyzes the files quite often (all that to say that there is a lot of activity).

@ne0ark I've used large_read with other FUSE systems (primarily node-gdrive-fuse) and never had drops.

ne0ark commented 7 years ago

@jrarseneau mout.fuse uses libfuse if you do man fuse you will see large_read is deprecated. Also you might want to check google api console to see that you are not hitting 100 sec per user limit which is set to 1000.

jrarseneau commented 7 years ago

@ne0ark I'm only peaking about 3req/s so no where near the limits.

My error ratio is 0.18% though, which seems a bit high.

astrada commented 7 years ago

@jrarseneau: I don't think I need curl.log to fix this issue (so, the next time, if you want, you can remount just with -verbose that logs only to gdfuse.log). Can you please send your gdfuse.log to alessandro.strada@gmail.com? Thanks!

jrarseneau commented 7 years ago

Logs sent!

Thanks!

jrarseneau commented 7 years ago

@astrada

Any updates to this? Have you had a chance to review the logs? I have had the mount drop a few times this week since sending you the logs last weekend.

Thanks!

astrada commented 7 years ago

Unfortunately this issue is not easy for me to reproduce. It's not an error that gets logged, but it's probably a deadlock that stops the process execution without leaving clear traces.

ne0ark commented 7 years ago

@astrada you can replicated it if you install https://github.com/Storj/storjshare-daemon and point it to mounted drive.

jrarseneau commented 7 years ago

This past weekend had about 8-10 drops. What was odd is these drops were both clustered in 2 instances where it dropped about 4 times in 5 minutes.

bbviking commented 7 years ago

I've got exactly the same issue and i'm using it for plex. It also seems to delete the folder which it was mounted to, until i reboot and the folder comes back. A shame really considering this is by far the best performing solution for mounting google drive for streaming on plex. But for me it seems to be far more frequent, about every 5-10 minutes

astrada commented 7 years ago

I'm working on it. I think I know where the problem is, but since I cannot reproduce plex workload, I'm not 100% sure. In the next few days, I'm going to push an experimental patch that needs to be tested. If you tell me how you installed this application (via PPA or OPAM), I will post instructions on how to install this experimental version.

ne0ark commented 7 years ago

@astrada PPA :)

jrarseneau commented 7 years ago

Same. PPA

And thanks for this!

bbviking commented 7 years ago

Yeah it's ppa for me to, thanks for the possible fix I'm looking forward to it :)

lcasey001 commented 7 years ago

Haven't had the same frequency as the others above using OPAM, not sure if that makes a difference or not but willing to test the OPAM version here.

haljordan2814 commented 7 years ago

In for PPA version. thanks for looking into this

bbviking commented 7 years ago

Does anybody know of a temporary solution? like for detecting when it breaks you are able to unmount the drive and quickly remount. Tried using inotify-tools but it isn't being set off when the drive breaks.

brancomat commented 7 years ago

@buttpain my 2 cents: it's not always easy to define a broken status: with a large number of files (or a cache that needs some cleaning) some operations could take a large amount of time.

Back when I had some mountpoint hang issues, my alternatives were either running a command (ls or whatever) in timeout (setting a reasonable amount of time) or something like mount|grep google-drive-ocamlfuse. In both cases you can trap the output and try launching a fuserumount -u $PATH && google-drive-ocamlfuse $PATH if something is wrong (or simply try a -cc to clean the mountpoint cache)

YMMV, it might not be a bright idea if some other process is trying to write on that mountpoint (you could enter a state in which you unmount the path and for some reason the remount fails and the other process starts writing on the (now very local) mount path filling that partition. (been there, done that)

bbviking commented 7 years ago

Thanks for the response brancomat, actually got a reaaallllyyy ghetto solution working just a couple hours ago.

ls -l /var/lib/plexmediaserver/media/plex/log.txt if [ -rw-r--r-- 1 root root 20904 Apr 4 17:34 /var/lib/plexmediaserver/media/plex/log.txt ]; then echo OK else fusermount -z -u /var/lib/plexmediaserver/media/ & google-drive-ocamlfuse -label me /var/lib/plexmediaserver/media/ -o allow_other,auto_cache fi

That script checks the properties of a log.txt file created on my google drive, and if it gets backs the properties of the log file then nothing will happen. But if it doesn't get back the properties then the drive gets unmounted and then immediately remounted. Now to have it be running constantly I had this command in the /etc/rc.local file

sudo -H -u root screen -d -m watch -n1 sh /etc/init.d/boot.sh

This means that every second my script that I made will be run, checking if the log file on the mounted drive is still accessible, and if not unmount, remount.

astrada commented 7 years ago

With opam you can test the new beta version with:

opam pin add google-drive-ocamlfuse https://github.com/astrada/google-drive-ocamlfuse.git#beta

To restore the stable version:

opam pin remove google-drive-ocamlfuse

In the next days, I'm going to publish this beta version to the beta PPA.

jrarseneau commented 7 years ago

Testing with opam, was quite easy to switch and I wanted to provide feedback as quick as possible.

Just mounted it now, so will post back when/if it drops.

bbviking commented 7 years ago

Would really like to test it out, but i can't seem to get opam to work on my vps.

astrada commented 7 years ago

I've just published this new version to my beta PPA (https://launchpad.net/~alessandro-strada/+archive/ubuntu/google-drive-ocamlfuse-beta).

bbviking commented 7 years ago

Unfortunately this doesn't fix the issue, I analyse my library in plex and before it can finish all my movies become unavailable.

astrada commented 7 years ago

Can you please email me your log files? Thanks

bbviking commented 7 years ago

will do when my 24 hour api ban ends. But from what I remember from when I checked before there was nothing in the log which seemed to indicate that anything had gone wrong which seemed very odd.

jrarseneau commented 7 years ago

@buttpain

This may be a separate issue? If as you state above, your "media becomes unavailable", and then you indicate you had a 24 hour ban, this would explain why your media is unavailable.

Now this may be related to something google-drive-ocamlfuse is doing, but I'm not entirely convinced it's the same issue as this thread?

This thread is about the google-drive-ocamlfuse process dying unexpectedly. The process is killed, the mount is dropped. It is not about files becoming unavailable or bans.

bbviking commented 7 years ago

Oh no, the 24 hour ban happened because I tried too many time to update my plex library, after I tested google-drive-ocamlfuse and started using rclone mount instead.

jrarseneau commented 7 years ago

Ah ok.. I'm still running on the initial mount from a few days ago, hasn't dropped yet. But I have gone a week before without a drop.

bbviking commented 7 years ago

Yeah when I'm just streaming content off of plex it's totally fine. But issues arise when ever I attempt to analyse, or update the library

hjone72 commented 7 years ago

I found that having -debug on caused my curl.log to get so large that ocaml crashed.

Thread 20284 killed on uncaught exception Sys_error("No space left on device")
Thread 20286 killed on uncaught exception Sys_error("No space left on device")
Thread 20287 killed on uncaught exception Sys_error("No space left on device")
Thread 20283 killed on uncaught exception Sys_error("No space left on device")
Thread 20285 killed on uncaught exception Sys_error("No space left on device")
astrada commented 7 years ago

@hjone72: Yes -debug dumps every byte exchanged with the server. Using -verbose should be enough for troubleshooting this particular issue.

hjone72 commented 7 years ago

@astrada, thank you. I've added your beta repository and run sudo apt-get upgrade is there any way to ensure that I am running your latest version? I am still facing an unmount issue.

astrada commented 7 years ago

@hjone72, google-drive-ocamlfuse -version should return 0.6.18 (that's the new beta version).

hjone72 commented 7 years ago

@astrada Definitely running the beta. While the mount in foreground mode -f it randomly unmounts displaying a message Fatal error: out of memory.. I've got a script running that remounts it so I've lost the logs this time, but next time it drops i'll post them.

bbviking commented 7 years ago

I forgot to post this a couple days ago, but the latest beta uses an insane amount of RAM. Filling up all 8GB of the RAM i have on my VPS and causing it to shutdown by streaming one file.

dany20mh commented 7 years ago

@hjone72 What’s your script for running the mount back again if it drop?

astrada commented 7 years ago

@buttpain: Which values have you set up in max_memory_cache_size, memory_buffer_size, and read_ahead_buffers?

@jrarseneau: Are you experiencing abnormal memory usage? Anyone else?

jrarseneau commented 7 years ago

@astrada My memory usage is fine (~1-1.5GB, but that's based on my config), here are the settings I'm using for the above configs.

No crashes yet, however I have had to restart my server due to a glitch (not related) so my counter is restarted since yesterday the 9th.

max_memory_cache_size=1073741824
memory_buffer_size=8388608
read_ahead_buffers=5
hjone72 commented 7 years ago

@dany20mh, its a modification of this one. https://gist.github.com/jrarseneau/e634ca4b71c3281b6f57779c7957ba34

@astrada, Mine is using 1.6GiB and my config is set to the following.

max_memory_cache_size=504857600
memory_buffer_size=8388608
read_ahead_buffers=5

My understanding was the max_memory_cache_size was an absolute max? I haven't had a crash in nearly 12 hours now, but as soon as I do I'll grab the logs.

astrada commented 7 years ago

My understanding was the max_memory_cache_size was an absolute max?

Yes, it should be, but that's about the memory cache. Anyway, did you notice an increase of memory usage between this version and 0.6.17?

hjone72 commented 7 years ago

@astrada, It is hard to say as I wasn't running version 0.6.17 for very long. sorry.

ne0ark commented 7 years ago

Seems like something funky is going on:

140G    /root/.gdfuse/storj/cache

My config:

max_cache_size_mb=20480
~/.gdfuse/storj/cache# ls -lath
total 140G
drwx------ 3 root root 4.0K Apr 11 14:05 ..
drwx------ 2 root root 264K Apr 11 14:05 .
-rw-r--r-- 1 root root 2.8M Apr 11 14:05 cache.db
-rw-r--r-- 1 root root 3.5M Apr 11 13:17 0B1Kk_ENv1qSGb3RYaTFnVGlTcDA
-rw-r--r-- 1 root root 1.0T Apr 11 09:12 0B1Kk_ENv1qSGNDRsRUk4SHYteGM
-rw-r--r-- 1 root root 1.0T Apr 11 00:26 0B1Kk_ENv1qSGaHRxMnh0UzhyajQ
-rw-r--r-- 1 root root 1.0T Apr 11 00:21 0B1Kk_ENv1qSGZk5ack1fSnZVZms
-rw-r--r-- 1 root root 1.0T Apr 10 23:28 0B1Kk_ENv1qSGVFh2ZFdmQWVGcFk
-rw-r--r-- 1 root root 1.0T Apr 10 23:28 0B1Kk_ENv1qSGQkFqc0dfdndXWHM
fuse    uid=0,gid=0,auto_cache,default_permissions,allow_other,atomic_o_trunc

@astrada any clue why its using more then allowed?

hjone72 commented 7 years ago

@astrada, Mount was stable almost all day. Then it crashed three times in two hours. I've got the three log files but they are quite large (1GB, 207MB, 66MB). How would you like me to share them?

astrada commented 7 years ago

@hjone72: gzip should shrink them a bit. Then you could just email them to me (alessandro.strada@gmail.com) or share them to me on Drive or Dropbox, whatever you prefer. Does it work for you? Thanks

astrada commented 7 years ago

@ne0ark: if your mount dropped unexpectedly, there could be unreferenced files in the local cache that will not get cleaned up. Otherwise, if you can reproduce the issue after remounting with -cc -verbose, could you send me your gdfuse.log, so that I can investigate the issue? Thanks.

hjone72 commented 7 years ago

@astrada, I've emailed you a link to my logs. :)

Thanks!

ne0ark commented 7 years ago

@astrada I also send you a log with IO error. Would you be able to check it? It seems like cache is still skipping some parts.

Thanks

jrarseneau commented 7 years ago

Hi

Any movement on this @astrada ?