gboudreau / Greyhole

Greyhole uses Samba to create a storage pool of all your available hard drives, and allows you to create redundant copies of the files you store.
http://www.greyhole.net
GNU General Public License v3.0
262 stars 34 forks source link

Samba spool processing stalling and eventually recovering (sometimes after hours) #293

Open obelisk3847 opened 2 years ago

obelisk3847 commented 2 years ago

I am the person who is running PHP 8.1 that you helped earlier by recompiling a version that supported that (https://github.com/gboudreau/Greyhole/issues/292).

I have noticed that periodically greyhole stops processing the samba spool so I wrote a cron utility that basically looks for the sleep log entry 25 consecutive times without seeing a spool processing entry and then triggers a custom log that says spool processing has stalled. I noticed that sometimes I have been able to wake it up by triggering an fsck so since my system is quite large I do that but in a subfolder of a specific share that tends to be more active than the rest. This will run every 5 minutes (the fsck completes in less than a minute). Here is what that log looks like over the last few days which is typical behaviour...

Apr 21 2022 10:15:03.106 : Will attempt to wake up Samba spool processing with fsck Apr 21 2022 10:20:03.332 : Will attempt to wake up Samba spool processing with fsck Apr 21 2022 10:25:02.791 : Will attempt to wake up Samba spool processing with fsck Apr 21 2022 10:30:03.526 : Will attempt to wake up Samba spool processing with fsck Apr 22 2022 06:40:03.013 : Will attempt to wake up Samba spool processing with fsck Apr 22 2022 16:55:03.349 : Will attempt to wake up Samba spool processing with fsck Apr 22 2022 17:00:04.101 : Will attempt to wake up Samba spool processing with fsck Apr 22 2022 18:10:03.241 : Will attempt to wake up Samba spool processing with fsck Apr 22 2022 18:15:02.973 : Will attempt to wake up Samba spool processing with fsck Apr 22 2022 18:20:02.762 : Will attempt to wake up Samba spool processing with fsck Apr 23 2022 14:20:03.250 : Will attempt to wake up Samba spool processing with fsck Apr 23 2022 14:25:02.839 : Will attempt to wake up Samba spool processing with fsck Apr 23 2022 14:30:02.593 : Will attempt to wake up Samba spool processing with fsck Apr 23 2022 14:35:03.146 : Will attempt to wake up Samba spool processing with fsck Apr 23 2022 14:40:02.807 : Will attempt to wake up Samba spool processing with fsck Apr 23 2022 14:45:02.794 : Will attempt to wake up Samba spool processing with fsck Apr 24 2022 00:20:03.001 : Will attempt to wake up Samba spool processing with fsck Apr 24 2022 02:45:03.846 : Will attempt to wake up Samba spool processing with fsck Apr 24 2022 02:50:02.791 : Will attempt to wake up Samba spool processing with fsck Apr 24 2022 02:55:02.415 : Will attempt to wake up Samba spool processing with fsck Apr 24 2022 03:00:03.174 : Will attempt to wake up Samba spool processing with fsck Apr 24 2022 03:10:03.312 : Will attempt to wake up Samba spool processing with fsck

I did some more investigating this evening and noticed that when spool processing is stalled the mysqld process becomes the highest CPU load process for the kernel. Using pidstat to get the thread I was able to determine that greyhole is perpetually querying the task table using the last spool process id that was processed (4415387 below) as a reference for each of the shares...

mysql> select * from performance_schema.threads where THREAD_OS_ID = 6173 \G *************************** 1. row *************************** THREAD_ID: 80651 NAME: thread/sql/one_connection TYPE: FOREGROUND PROCESSLIST_ID: 80612 PROCESSLIST_USER: greyhole_user PROCESSLIST_HOST: localhost PROCESSLIST_DB: greyhole PROCESSLIST_COMMAND: Query PROCESSLIST_TIME: 1 PROCESSLIST_STATE: statistics PROCESSLIST_INFO: SELECT id, full_path FROM tasks WHERE complete = 'no' AND share = 'TV' AND full_path = '.' AND id <= '4415387' PARENT_THREAD_ID: NULL ROLE: NULL INSTRUMENTED: YES HISTORY: YES CONNECTION_TYPE: Socket THREAD_OS_ID: 6173 RESOURCE_GROUP: USR_default 1 row in set (0.00 sec)

While it was in this condition I also checked the task table to see if it had anything in it...

mysql> select count(*) from tasks ; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec)

Eventually spool processing resumed and this query is no longer occurring (or at least at the rate it was). CPU load for the mysqld process has also gone back to normal. Do you have any insight as to what may be happening? Is there any other data I could collect for you that would help diagnose this behaviour?

gboudreau commented 2 years ago

Maybe try to change the log level from DEBUG to PERF, and wait until it happens again. (Of note: using that log level, sleeps are for 1s, instead of 10s, when the daemon sleeps.)

Once you are able to reproduce the issue, can you show here what were the last log entries that appeared in greyhole.log?

gboudreau commented 2 years ago

I created a 0.15.15 build that adds some logging in this part of the code, and has some small optimizations that might help. Test using that version, and the above suggestion (you won't see the new logs with log_level=DEBUG; you need to use PERF).

obelisk3847 commented 2 years ago

So this happened again this morning while I was in DEBUG logging but here is the sequence that triggered it...

Apr 24 08:01:06 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:11 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:14 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:16 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:19 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 08:01:21 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:26 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:29 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:32 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:33 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 08:01:37 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:37 DEBUG read_smb_spool: Found 1 new tasks in spool.
Apr 24 08:01:42 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:01:50 INFO unlink: Now working on task ID 4424652: unlink Greyhole Recycle Bin/TV_no_backup/Premieres/How We Roll/How We Roll - S01E01 -.mkv
Apr 24 08:01:50 INFO unlink: File deleted: /home/greyhole_lz/Greyhole Recycle Bin/TV_no_backup/Premieres/How We Roll/How We Roll - S01E01 -.mkv
Apr 24 08:01:50 DEBUG unlink:   Looking for corresponding file in trash to delete...
Apr 24 08:01:50 DEBUG unlink:     Deleting corresponding copy /mnt/drvpool/drive12/gh/.gh_trash/TV_no_backup/Premieres/How We Roll/How We Roll - S01E01 -.mkv
Apr 24 08:02:00 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 08:02:20 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 08:02:40 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 08:02:55 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 08:02:57 INFO unlink: Now working on task ID 4424653: unlink Greyhole Recycle Bin/TV_no_backup/TV/Severance/Season 1/Severance - S01E08 - Whats for Dinner WEBDL-2160p EAC3 Atmos 5.1 HDR10Bit.mp4
Apr 24 08:02:57 INFO unlink: File deleted: /home/greyhole_lz/Greyhole Recycle Bin/TV_no_backup/TV/Severance/Season 1/Severance - S01E08 - Whats for Dinner WEBDL-2160p EAC3 Atmos 5.1 HDR10Bit.mp4
Apr 24 08:02:57 DEBUG unlink:   Looking for corresponding file in trash to delete...
Apr 24 08:02:57 DEBUG unlink:     Deleting corresponding copy /mnt/drvpool/drive1/gh/.gh_trash/TV_no_backup/TV/Severance/Season 1/Severance - S01E08 - Whats for Dinner WEBDL-2160p EAC3 Atmos 5.1 HDR10Bit.mp4
Apr 24 08:03:04 INFO rmdir: Now working on task ID 4424654: rmdir Greyhole Recycle Bin/TV_no_backup/Premieres/How We Roll
Apr 24 08:03:04 INFO rmdir: Directory deleted: /home/greyhole_lz/Greyhole Recycle Bin/TV_no_backup/Premieres/How We Roll
Apr 24 08:03:04 DEBUG rmdir:   Removed copy from trash at /mnt/drvpool/drive1/gh/.gh_trash/TV_no_backup/Premieres/How We Roll
Apr 24 08:03:04 DEBUG rmdir:   Removed copy from trash at /mnt/drvpool/drive12/gh/.gh_trash/TV_no_backup/Premieres/How We Roll
Apr 24 08:03:15 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 08:03:34 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 08:03:54 DEBUG sleep: Nothing to do... Sleeping.

At this point, it stopped processing the spool for the next 4 hours. I noticed last night that the spool restarted when I deleted something in the share and triggered the FSCK so I tried that sequence again (note that at this point I had the logs set to PERF). First I touched a file called test in the file system where the FSCK would run...

Apr 24 12:10:36.895875 INFO fsck: fsck for /home/greyhole_lz/TV_no_backup_LZ/Premieres/ completed.
Apr 24 12:10:43.18417 INFO write: Now working on task ID 4424732: write TV_no_backup/Premieres/test
Apr 24 12:10:43.153907 INFO write: File created: TV_no_backup/Premieres/test - 0.00B
Apr 24 12:10:43.161941 DEBUG write: Loading metafiles for TV_no_backup/Premieres/test ...
Apr 24 12:10:43.169849 DEBUG write:   Got 0 metadata files.
Apr 24 12:10:44.762873 DEBUG write:   Drives with available space: /mnt/drvpool/drive16/gh (480GB avail) - /mnt/drvpool/drive1/gh (371GB avail) - /mnt/drvpool/drive12/gh (240GB avail) - /mnt/drvpool/drive17/gh (175GB avail) - /mnt/drvpool/drive11/gh (169GB avail) - /mnt/drvpool/drive13/gh (166GB avail) - /mnt/drvpool/drive8/gh (157GB avail) - /mnt/drvpool/drive9/gh (157GB avail) - /mnt/drvpool/drive2/gh (155GB avail) - /mnt/drvpool/drive20/gh (150GB avail) - /mnt/drvpool/drive3/gh (149GB avail) - /mnt/drvpool/drive10/gh (149GB avail) - /mnt/drvpool/drive19/gh (148GB avail) - /mnt/drvpool/drive6/gh (147GB avail) - /mnt/drvpool/drive4/gh (145GB avail) - /mnt/drvpool/drive15/gh (141GB avail) - /mnt/drvpool/drive7/gh (140GB avail) - /mnt/drvpool/drive5/gh (139GB avail) - /mnt/drvpool/drive18/gh (135GB avail) - /mnt/drvpool/drive14/gh (135GB avail) - /mnt/drvpool/drive21/gh (132GB avail)
Apr 24 12:10:44.771974 DEBUG write:   Saving 1 metadata files for TV_no_backup/Premieres/test
Apr 24 12:10:44.799792 DEBUG write:     Saving backup metadata file in /mnt/drvpool/drive1/gh/.gh_metastore_backup/TV_no_backup/Premieres/test
Apr 24 12:10:45.945781 INFO write:   Copying 0.00B file to /mnt/drvpool/drive16/gh/TV_no_backup/Premieres/test
Apr 24 12:10:45.958397 DEBUG write:     Copied file MD5 = d41d8cd98f00b204e9800998ecf8427e
Apr 24 12:10:45.967501 DEBUG write:   Creating symlink in share pointing to /mnt/drvpool/drive16/gh/TV_no_backup/Premieres/test
Apr 24 12:10:45.975180 DEBUG write:   Saving 1 metadata files for TV_no_backup/Premieres/test
Apr 24 12:10:45.997976 DEBUG write:     Saving backup metadata file in /mnt/drvpool/drive1/gh/.gh_metastore_backup/TV_no_backup/Premieres/test
Apr 24 12:10:56.511436 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 12:11:06.970211 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 12:11:17.168972 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 12:11:27.933327 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 12:11:38.231090 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 12:11:49.12250 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 12:11:59.665976 DEBUG sleep: Nothing to do... Sleeping.

Note that greyhole handled the new file after it was picked up by FSCK but did not start processing the spool so I then deleted the file and rerun the FSCK...

Apr 24 12:18:42.475754 DEBUG fsck: Found /mnt/drvpool/drive16/gh/TV_no_backup/Premieres/test
Apr 24 12:18:42.482652 DEBUG fsck: Loading metafiles for TV_no_backup/Premieres/test ...
Apr 24 12:18:42.489383 DEBUG fsck:   Got 1 metadata files.
Apr 24 12:18:42.496456 INFO fsck:   Symlink target moved. Updating symlink.
Apr 24 12:18:42.502961 DEBUG fsck:   Missing symlink... A pending unlink transaction maybe?
Apr 24 12:18:47.217328 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 12:18:52.87185 DEBUG read_smb_spool: Found 1 new tasks in spool.
Apr 24 12:18:52.96514 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 12:18:52.204228 DEBUG fsck:     Indeed! Pending unlink task found. Will not re-create this symlink.
Apr 24 12:18:52.212236 DEBUG fsck:   Saving 1 metadata files for TV_no_backup/Premieres/test
Apr 24 12:18:52.219474 DEBUG fsck:     Saving backup metadata file in /mnt/drvpool/drive1/gh/.gh_metastore_backup/TV_no_backup/Premieres/test
Apr 24 12:18:52.225956 DEBUG fsck: Entering /mnt/drvpool/drive16/gh/TV_no_backup/Premieres/Britains Deadliest Mums and Dads
Apr 24 12:18:52.232892 DEBUG fsck: Found /mnt/drvpool/drive16/gh/TV_no_backup/Premieres/Britains Deadliest Mums and Dads/Britains Deadliest Mums and Dads - S01E01 - The Honour Killing.mkv
Apr 24 12:18:52.239563 DEBUG fsck: Loading metafiles for TV_no_backup/Premieres/Britains Deadliest Mums and Dads/Britains Deadliest Mums and Dads - S01E01 - The Honour Killing.mkv ...
Apr 24 12:18:52.246195 DEBUG fsck:   Got 1 metadata files.

In this log sequence greyhole notes that the file was deleted and immediately triggers a Samba Spool Process action.

Finally after the FSCK completed greyhole processed the file deletion and was then processing the spool...

Apr 24 12:18:55.932537 INFO fsck: fsck for /home/greyhole_lz/TV_no_backup_LZ/Premieres/ completed.
Apr 24 12:18:55.962681 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 12:18:56.20874 INFO unlink: Now working on task ID 4424736: unlink TV_no_backup/Premieres/test
Apr 24 12:18:56.27622 INFO unlink: File deleted: /home/greyhole_lz/TV_no_backup_LZ/Premieres/test
Apr 24 12:18:56.34578 DEBUG unlink: Loading metafiles for TV_no_backup/Premieres/test ...
Apr 24 12:18:56.41797 DEBUG unlink:   Got 1 metadata files.
Apr 24 12:18:56.54019 DEBUG unlink:   Moved copy from /mnt/drvpool/drive16/gh/TV_no_backup/Premieres/test to trash: /mnt/drvpool/drive16/gh/.gh_trash/TV_no_backup/Premieres/test
Apr 24 12:18:56.66849 DEBUG unlink:   Removing metadata files for TV_no_backup/Premieres/test
Apr 24 12:18:56.74077 DEBUG unlink:     Removed metadata file at /mnt/drvpool/drive16/gh/.gh_metastore/TV_no_backup/Premieres/test
Apr 24 12:18:56.81084 DEBUG unlink:     Removed metadata file at /mnt/drvpool/drive1/gh/.gh_metastore_backup/TV_no_backup/Premieres/test
Apr 24 12:18:56.125596 INFO fsck_file: Now working on task ID 4424734: fsck_file TV_no_backup/Premieres/test
Apr 24 12:18:56.133672 DEBUG fsck_file: Loading metafiles for TV_no_backup/Premieres/test ...
Apr 24 12:18:56.140657 DEBUG fsck_file:   Got 0 metadata files.
Apr 24 12:18:56.149989 WARN fsck_file:   WARNING! No copies of this file are available in the Greyhole storage pool: "TV_no_backup/Premieres/test".
Apr 24 12:18:56.156867 DEBUG fsck_file:   Removing metadata files for TV_no_backup/Premieres/test
Apr 24 12:18:56.191903 INFO fsck_file: Now working on task ID 4424735: fsck_file TV_no_backup/Premieres/test
Apr 24 12:18:56.200007 DEBUG fsck_file: Loading metafiles for TV_no_backup/Premieres/test ...
Apr 24 12:18:56.206917 DEBUG fsck_file:   Got 0 metadata files.
Apr 24 12:18:56.213487 WARN fsck_file:   WARNING! No copies of this file are available in the Greyhole storage pool: "TV_no_backup/Premieres/test".
Apr 24 12:18:56.220816 DEBUG fsck_file:   Removing metadata files for TV_no_backup/Premieres/test
Apr 24 12:18:57.306205 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 12:19:01.213103 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 12:19:01.270360 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 12:19:02.225942 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 12:19:06.86033 DEBUG sleep: Nothing to do... Sleeping.
Apr 24 12:19:06.304707 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 12:19:07.100310 DEBUG read_smb_spool: Processing Samba spool...
Apr 24 12:19:10.995852 DEBUG sleep: Nothing to do... Sleeping.
obelisk3847 commented 2 years ago

I have 0.15.5 installed and running and will provide updated logs when this occurs again. I think I will also modify my detection utility to specifically delete something in the share when I run the FSCK to attempt recovery since that seems to be the trigger to restart spool processing.

gboudreau commented 2 years ago

The Samba spool can be processed either via a cron job (* * * * * root /usr/bin/greyhole --process-spool --keepalive >/dev/null), or by the daemon (in specific use-cases, but also each time it sleeps, if possible).

It also needs to acquire a (database) lock, in order to be able to run.

Since the daemon is able to run this code during a fsck (at Apr 24 12:18:47), but the cron can't, my guess would be that the daemon somehow didn't release the lock last time it ran, thus preventing the cron process from being able to acquire the lock, and thus run. But that doesn't explain why it can't run in the daemon, when it sleeps. Did you maybe restart the daemon, between Apr 24 12:11:59 and 12:18:42?

To test this, next time it happens, try to run it manually from a command line: sudo /usr/bin/greyhole --process-spool

And check in the DB, if the lock is available or not: SELECT IS_FREE_LOCK('read_smb_spool')

PS In your script, you should be able to simply restart the daemon; that would release the DB lock, allowing the cron and the new daemon process to acquire it.

obelisk3847 commented 2 years ago

I have tried daemon restarts in the past and it does not solve the issue as it had occurred to me as well that a process may have locked a samba or database resource. I think the database lock is working properly as I had noticed that one of the other threads was acquiring lock and releasing it on the table when I was looking at what mysqld was doing when it was using high CPU resources.

I had also tried manually triggering the process spool command as I had noticed you discussing it in another thread in the issue history.

I will retry these things on the new load though with the PERF logging active. Nothing to lose.

obelisk3847 commented 2 years ago

What should the value for the database lock query be during normal operation? This is what I see currently...

bergang@hda:~/tv_nbk_gh/Premieres$ sudo mysql -e "SELECT IS_FREE_LOCK('read_smb_spool')" +--------------------------------+ | IS_FREE_LOCK('read_smb_spool') | +--------------------------------+ | 1 | +--------------------------------+

Since this will almost certainly happen when I am not on the system should I just add this to my custom logging to record the lock status when I detect Samba Spool has stalled or is there a better way to do it?

lockstate=$(mysql -e "SELECT IS_FREE_LOCK('read_smb_spool')" | grep -e [0-9]) echo "$(timestamp) : The lockstate on the greyhole database is $lockstate" >>/custom_logs/cron_activity.log

obelisk3847 commented 2 years ago

The problem happened again and this time the system was idle when it occurred...

Apr 26 04:54:57.753920 DEBUG read_smb_spool: Processing Samba spool... Apr 26 04:54:57.767776 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:01.773387 DEBUG read_smb_spool: Processing Samba spool... Apr 26 04:55:01.782973 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:02.105408 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:03.116691 DEBUG read_smb_spool: Processing Samba spool... Apr 26 04:55:03.125058 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:06.796755 DEBUG read_smb_spool: Processing Samba spool... Apr 26 04:55:06.807050 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:07.149879 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:08.165540 DEBUG read_smb_spool: Processing Samba spool... Apr 26 04:55:08.176298 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:11.816466 DEBUG read_smb_spool: Processing Samba spool... Apr 26 04:55:11.827636 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:12.173208 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:13.184255 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:16.839048 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:17.902764 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:18.913069 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:21.849621 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:22.954059 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:23.969771 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:26.858598 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:28.475772 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:29.487002 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:31.868647 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:33.998186 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:35.10439 DEBUG read_smb_spool: Processing Samba spool... Apr 26 04:55:35.23076 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:36.877153 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:39.443446 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:40.460217 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:41.886760 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:45.14256 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:46.27554 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:46.898027 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:50.449659 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:51.462115 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:51.907969 PERF read_smb_spool: Finished parsing spool. Apr 26 04:55:56.128064 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:55:57.141106 PERF read_smb_spool: Finished parsing spool. Apr 26 04:56:01.338654 DEBUG sleep: Nothing to do... Sleeping. Apr 26 04:56:02.38643 PERF read_smb_spool: Finished parsing spool. Apr 26 04:56:02.351767 PERF read_smb_spool: Finished parsing spool. Apr 26 04:56:07.44694 DEBUG sleep: Nothing to do... Sleeping.

On Apr 26 04:55:11.816466 it processes the spool and then again 25 seconds later once at Apr 26 04:55:35.10439 but that is the last time it processed the spool. Based on the output of the query the lock state is 1 which I will leave up to your discretion whether that is normal or not. It has been going in and out of this state for the last several hours as the fsck will sometimes recover it but then it eventually goes back to not processing the spool. The restart did not have any impact on the behaviour...

Apr 26 08:56:28.473884 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:30.391961 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:32.463383 DEBUG sleep: Nothing to do... Sleeping. Apr 26 08:56:33.476934 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:35.454131 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:36.34730 INFO daemon: Greyhole (version 0.15.15) daemon started. Apr 26 08:56:36.41538 INFO daemon: Checking MySQL tables... Apr 26 08:56:37.232210 DEBUG daemon: Loading metadata store backup directories... Apr 26 08:56:37.239170 DEBUG daemon: Found 2 directories in the settings table. Apr 26 08:56:39.391493 DEBUG daemon: Checking symlink at /usr/lib/x86_64-linux-gnu/samba/vfs/greyhole.so... Apr 26 08:56:39.398230 DEBUG daemon: Is OK. Apr 26 08:56:39.452367 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:39.464622 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:40.467833 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:43.282363 DEBUG sleep: Nothing to do... Sleeping. Apr 26 08:56:44.295553 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:45.482092 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:48.59365 DEBUG sleep: Nothing to do... Sleeping. Apr 26 08:56:49.74683 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:50.494589 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:52.780463 DEBUG sleep: Nothing to do... Sleeping. Apr 26 08:56:53.793282 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:55.507833 PERF read_smb_spool: Finished parsing spool. Apr 26 08:56:57.535523 DEBUG sleep: Nothing to do... Sleeping. Apr 26 08:56:58.548232 PERF read_smb_spool: Finished parsing spool. Apr 26 08:57:00.641345 PERF read_smb_spool: Finished parsing spool. Apr 26 08:57:02.428704 DEBUG sleep: Nothing to do... Sleeping. Apr 26 08:57:03.442676 PERF read_smb_spool: Finished parsing spool.

I also attempted to run the spool command on the console but it is not showing up in the logs.

What is interesting is at the moment the fsck does trigger spool processing as it occurs three times immediately after the fsck completes but it is not sustained...

Apr 26 09:31:08.467273 INFO fsck: fsck for /home/greyhole_lz/TV_no_backup_LZ/Premieres/ completed. Apr 26 09:31:08.783264 DEBUG read_smb_spool: Processing Samba spool... Apr 26 09:31:08.789983 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:08.790042 PERF read_smb_spool: Found 1 close tasks. Will finalize all write tasks for those, if any... Apr 26 09:31:10.875972 PERF read_smb_spool: There are no complete=written or complete=no write tasks. No point looking into each individual close task... Apr 26 09:31:10.888485 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:14.743056 DEBUG sleep: Nothing to do... Sleeping. Apr 26 09:31:15.756757 DEBUG read_smb_spool: Processing Samba spool... Apr 26 09:31:15.774121 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:15.885906 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:20.51292 DEBUG sleep: Nothing to do... Sleeping. Apr 26 09:31:20.900484 DEBUG read_smb_spool: Processing Samba spool... Apr 26 09:31:20.940544 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:21.61307 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:25.9520 DEBUG sleep: Nothing to do... Sleeping. Apr 26 09:31:25.953497 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:26.23472 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:29.844158 DEBUG sleep: Nothing to do... Sleeping. Apr 26 09:31:30.859641 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:30.965155 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:35.78360 DEBUG sleep: Nothing to do... Sleeping. Apr 26 09:31:35.979525 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:36.91737 PERF read_smb_spool: Finished parsing spool. Apr 26 09:31:40.72817 DEBUG sleep: Nothing to do... Sleeping. Apr 26 09:31:40.992779 PERF read_smb_spool: Finished parsing spool.

obelisk3847 commented 2 years ago

Here is a list of the running greyhole processes as well...

root@hda:/etc/systemd# ps -ef | grep greyhole root 2684136 1 0 09:53 ? 00:00:00 /bin/bash /usr/bin/greyhole-php /usr/bin/greyhole --daemon root 2684137 2684136 0 09:53 ? 00:00:00 /usr/bin/php /usr/bin/greyhole --daemon root 2684576 2684575 0 09:54 ? 00:00:00 /bin/sh -c /usr/bin/greyhole --process-spool --keepalive >/dev/null root 2684577 2684576 0 09:54 ? 00:00:00 /bin/bash /usr/bin/greyhole-php /usr/bin/greyhole --process-spool --keepalive root 2684578 2684577 0 09:54 ? 00:00:00 /usr/bin/php /usr/bin/greyhole --process-spool --keepalive

gboudreau commented 2 years ago

So it shows Finished parsing spool but no Processing Samba spool. Let's look into how this can happen.

IS_FREE_LOCK returning 1 means the lock is free (not locked). It should be 0 only during processing the spool, and return to 1 when done. So unless there is a lot to process in the spool, and you check when this is processed, or unless the lock was kept for some reason, which does not seem to be the case, 1 is expected.

obelisk3847 commented 2 years ago

It looks like the database doesn't keep any persistent data other than completed tasks. I ended up with a corrupted database before all this started and I manually recovered an old version which caused greyhole to reprocess a bunch of old transactions through the spool which took days. Is it possible to drop the database and rerun the creation script to make sure it is not causing a problem?

gboudreau commented 2 years ago

Processing Samba spool... is only logged when there IS something in the spool to process. If the spool is empty, nothing is logged (at DEBUG level; Finished parsing spool. is still logged at PERF level). If neither of those are logged, then you have a problem.

Here's the command used to check if there is something in the spool: find -L /var/spool/greyhole -type f -printf "%T@ %p\n" | sort -n 2> /dev/null | head -n 10000

So when you notice no Processing Samba spool... log, you can try to create a dummy file on any share: echo $(date) > /mnt/samba/SomeShare/ping To make sure the spool processing still works as expected.

obelisk3847 commented 2 years ago

I have added "Finished parsing spool" to the log analysis where I check for consecutive entries of "Nothing to do" without seeing anything else. I will have to give some thought as to whether I automate the actual creation/deletion of a file to test for that condition so as not to trigger some unexpected corner cases with large files blocking the process queue while they are being worked on.