Drive failure, now "Warning! All storage pool drives are over-capacity!" warning.

tamorgen commented 2 years ago

Yesterday I had a drive failure. I had to rebuild the file system, but I ended up loosing all the files on the drive. While the drive was being repaired, it appears that Greyhole started reproducing copies of the missing files for the bad drive. Now they are all over capacity.

The one drive's file system is back in place, and I've run the --replaced command, but it's not moving files to the new drive.

[tmorgenthaler@galactica log]$ curl -Ls https://bit.ly/gh-infos | sudo sh ls: cannot access '/mnt/samba/': No such file or directory Failed to restart greyhole.service: Unit smbd.service not found. Here's the URL you will need to give the to person who's helping you: http://ix.io/3OFC

gboudreau commented 2 years ago

A fsck should fix it:

greyhole --fsck --email-report --disk-usage-report

Follow progress in the log (or greyhole -L)

tamorgen commented 2 years ago

Okay, thanks. That seemed to have worked. For some reason it didn't make any progress overnight, but picked up sometime today and rebalanced the drives. It would appear that a few files are missing based upon the logs, but not many.

One other problem, the smb access doesn't appear to be working now. I can see the top level folders, but when I click on them, I get an error on my Mac "The operations can't be completed because the original item for "Bluray" can't be found."

gboudreau commented 2 years ago

Look in the Samba log (smb.log or similar), for errors regarding Samba shares.

tamorgen commented 2 years ago

I looked in there, but all I saw was a copyright message. I can access other shares on the server, that aren't in the Greyhole drive pool.

gboudreau commented 2 years ago

Look in other files in tbe Samba log folder. In the link you provided initially, I see many different errors in various .log files:

#### /var/log/samba/macbookair-f61a.log
[2019/11/03 19:38:01.223331,  0] ../../source3/smbd/service.c:632(make_connection_snum)
  make_connection_snum: vfs_init failed for service Bluray
[2019/11/03 19:42:13.014188,  0] ../../lib/util/modules.c:49(load_module)
  Error loading module '/usr/lib64/samba/vfs/greyhole.so': /usr/lib64/samba/vfs/greyhole.so: cannot open shared object file: Too many levels of symbolic links

#### /var/log/samba/dhcp-10-254-18-16.log
[2019/07/12 10:45:24.771856,  0] ../../source3/smbd/service.c:632(make_connection_snum)
  make_connection_snum: vfs_init failed for service Bluray
[2019/07/12 11:27:43.108463,  0] ../../lib/util/modules.c:49(load_module)
  Error loading module '/usr/lib64/samba/vfs/greyhole.so': libgssapi-samba4.so.2: cannot open shared object file: No such file or directory

Look in recent .log files to see which of those errors, if any, still happen. Use ls -ltr in /var/log/samba/ to see the most recently modified files last.

tamorgen commented 2 years ago

I see some errors from my iMac.

[2022/02/05 21:32:56.029364,  0] ../../source3/smbd/service.c:638(make_connection_snum)
  make_connection_snum: vfs_init failed for service Bluray
[2022/02/05 21:32:56.029815,  0] ../../lib/util/modules.c:49(load_module)
  Error loading module '/usr/lib64/samba/vfs/greyhole.so': /usr/lib64/samba/vfs/greyhole.so: file too short
[2022/02/05 21:32:56.029843,  0] ../../source3/smbd/vfs.c:185(vfs_init_custom)
  error probing vfs module 'greyhole': NT_STATUS_UNSUCCESSFUL
[2022/02/05 21:32:56.029856,  0] ../../source3/smbd/vfs.c:399(smbd_vfs_init)
  smbd_vfs_init: vfs_init_custom failed for greyhole
[2022/02/05 21:32:56.029874,  0] ../../source3/smbd/service.c:638(make_connection_snum)
  make_connection_snum: vfs_init failed for service Bluray
[2022/02/05 21:32:56.035682,  0] ../../lib/util/modules.c:49(load_module)
  Error loading module '/usr/lib64/samba/vfs/greyhole.so': /usr/lib64/samba/vfs/greyhole.so: file too short
[2022/02/05 21:32:56.035706,  0] ../../source3/smbd/vfs.c:185(vfs_init_custom)
  error probing vfs module 'greyhole': NT_STATUS_UNSUCCESSFUL

gboudreau commented 2 years ago

Check the /usr/lib64/samba/vfs/greyhole.so file. It should be a symlink pointing to /usr/lib64/greyhole/greyhole-samba415.so

greyhole-samba415.so is not a file included in Greyhole packages, so I guess it was built on your machine, using build_vfs.sh Maybe it's corrupted now, because you were out of space on your root drive. Try to remove /usr/lib64/greyhole/greyhole-samba415.so and /usr/lib64/samba/vfs/greyhole.so, then restart the Greyhole daemon, and look at the Greyhole logs. It should warn about a missing VFS module, and will either build it automatically, or give you instructions on how to.

tamorgen commented 2 years ago

I deleted the file and the corresponding symlink. I restarted the greyhole service, and I'm now getting the following:

[root@galactica log]# systemctl restart greyhole.service 
Failed to restart greyhole.service: Unit smbd.service not found.

No changes in the greyhole.log since 10:26 this morning (EST)

I'm running Fedora Server 35.

I did a systemctl stautus smb:

● smb.service - Samba SMB Daemon
     Loaded: loaded (/usr/lib/systemd/system/smb.service; enabled; vendor preset: disabled)
     Active: active (running) since Sat 2022-02-05 21:46:04 EST; 39min ago
       Docs: man:smbd(8)
             man:samba(7)
             man:smb.conf(5)
   Main PID: 154225 (smbd)
     Status: "smbd: ready to serve connections..."
      Tasks: 5 (limit: 9338)
     Memory: 1.1G
        CPU: 26.631s
     CGroup: /system.slice/smb.service
             ├─154225 /usr/sbin/smbd --foreground --no-process-group
             ├─154227 /usr/sbin/smbd --foreground --no-process-group
             ├─154228 /usr/sbin/smbd --foreground --no-process-group
             ├─154229 /usr/libexec/samba/samba-bgqd --ready-signal-fd=47 --parent-watch-fd=13 --debuglevel=0 -F
             └─154372 /usr/sbin/smbd --foreground --no-process-group

Feb 05 21:47:21 galactica.starfleet.org smbd[154372]: [2022/02/05 21:47:21.576049,  0] ../../source3/smbd/service.c:638(make_connection_snum)
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]:   make_connection_snum: vfs_init failed for service Bluray
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]: [2022/02/05 21:47:21.576558,  0] ../../lib/util/modules.c:49(load_module)
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]:   Error loading module '/usr/lib64/samba/vfs/greyhole.so': /usr/lib64/samba/vfs/greyhole.so: file too short
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]: [2022/02/05 21:47:21.576581,  0] ../../source3/smbd/vfs.c:185(vfs_init_custom)
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]:   error probing vfs module 'greyhole': NT_STATUS_UNSUCCESSFUL
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]: [2022/02/05 21:47:21.576595,  0] ../../source3/smbd/vfs.c:399(smbd_vfs_init)
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]:   smbd_vfs_init: vfs_init_custom failed for greyhole
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]: [2022/02/05 21:47:21.576607,  0] ../../source3/smbd/service.c:638(make_connection_snum)
Feb 05 21:47:21 galactica.starfleet.org smbd[154372]:   make_connection_snum: vfs_init failed for service Bluray
~

I also restarted the smb.service, then the greyhole service. Same result.

gboudreau commented 2 years ago

You still have the same error: Error loading module '/usr/lib64/samba/vfs/greyhole.so': /usr/lib64/samba/vfs/greyhole.so: file too short

Now that you deleted that file before restarting Sama, what is being used here?

ls -la /usr/lib64/samba/vfs/greyhole.so
ls -la /usr/lib64/greyhole/greyhole-samba415.so

tamorgen commented 2 years ago

I think that is from earlier, not the current issue. It's currently 23:15 EST, so those logs were from 90 minutes ago.

[root@galactica greyhole]# ls -la /usr/lib64/samba/vfs/greyhole.so
ls: cannot access '/usr/lib64/samba/vfs/greyhole.so': No such file or directory
[root@galactica greyhole]# ls -la /usr/lib64/greyhole/greyhole-samba415.so
ls: cannot access '/usr/lib64/greyhole/greyhole-samba415.so': No such file or directory

gboudreau commented 2 years ago

How did you install Greyhole?

Try to re-install the latest version, 0.15.12. I added support to Samba 4.15, so re-installing that should re-install the .so you need.

tamorgen commented 2 years ago

I installed using your install script probably 5 years ago.

What is the best way to reinstall? I tried your script, and it told me I already have the latest version installed.

gboudreau commented 2 years ago

Ask your package manager (yum?) to clear the packages cache, and then update the greyhole package.

tamorgen commented 2 years ago

okay, makes sense. I believe I had the greyhole repo already linked, so it should be updating automatically.

When I try to perform an install, it is saying that 0.15.11-1 is the latest version, and it's already installed, like it did with your script.

[root@galactica greyhole]# dnf clean dbcache
23 files removed
[root@galactica greyhole]# dnf install greyhole
Last metadata expiration check: 1:01:27 ago on Sat 05 Feb 2022 10:47:57 PM EST.
Package greyhole-0.15.11-1.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!

gboudreau commented 2 years ago

Maybe dnf clean all?

tamorgen commented 2 years ago

same result, but longer.

[root@galactica dnf]# dnf clean  all
183 files removed
[root@galactica dnf]# dnf install greyhole
Fedora 35 - x86_64                                                                                                                        21 MB/s |  79 MB     00:03    
Fedora 35 openh264 (From Cisco) - x86_64                                                                                                 3.7 kB/s | 2.5 kB     00:00    
Fedora Modular 35 - x86_64                                                                                                               1.9 MB/s | 3.3 MB     00:01    
Fedora 35 - x86_64 - Updates                                                                                                              21 MB/s |  25 MB     00:01    
Fedora Modular 35 - x86_64 - Updates                                                                                                     5.0 MB/s | 2.8 MB     00:00    
Greyhole Repo                                                                                                                            138 kB/s |  48 kB     00:00    
MongoDB Repository                                                                                                                       124 kB/s |  39 kB     00:00    
PlexRepo                                                                                                                                  71 kB/s |  17 kB     00:00    
RPM Fusion for Fedora 35 - Nonfree                                                                                                        12 kB/s | 239 kB     00:19    
RPM Fusion for Fedora 35 - Nonfree - Updates                                                                                             176 kB/s |  72 kB     00:00    
Package greyhole-0.15.11-1.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!

gboudreau commented 2 years ago

Try again (dnf clean all); I just realized I have a CloudFlare cache, which I just cleared!

tamorgen commented 2 years ago

That did it, and fixed the browsing problem. Thanks for your help!

tamorgen commented 2 years ago

Hey @Guillaume, Sorry, I may have spoken too soon.

Everything "appears" to be working fine, but I checked the logs this afternoon, and I came across a warning.

Feb 06 12:57:35 WARN daemon: Greyhole VFS module (/usr/lib64/samba/vfs/greyhole.so) seems to be missing some required libraries. If you have issues connecting to your Greyhole-enabled shares, try to compile a new VFS module for Samba by running this command: /usr/share/greyhole/build_vfs.sh current

As I said last night, I can access the shares and all, but I wanted to clear the warning, in case it was an issue. I'm not sure if this is something I should be worried about or not. I tried following the directions, and it's failing.

[tmorgenthaler@galactica log]$ sudo /usr/share/greyhole/build_vfs.sh current
Installing build dependencies ...
Last metadata expiration check: 3:11:02 ago on Sun 06 Feb 2022 09:51:04 AM EST.
Package patch-2.7.6-15.fc35.x86_64 is already installed.
Package gcc-11.2.1-7.fc35.x86_64 is already installed.
Package python3-devel-3.10.2-1.fc35.x86_64 is already installed.
Package gnutls-devel-3.7.2-2.fc35.x86_64 is already installed.
Package make-1:4.3-6.fc35.x86_64 is already installed.
Package rpcgen-1.4-8.fc35.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!
Installing Parse::Yapp::Driver perl module ...
Can't locate CPAN.pm in @INC (you may need to install the CPAN module) (@INC contains: /usr/local/lib64/perl5/5.34 /usr/local/share/perl5/5.34 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5).
BEGIN failed--compilation aborted.

gboudreau commented 2 years ago

If you can connect to your shares, you can ignore this warning. If you still want to try to compile the VFS module, try sudo dnf -y install perl-CPAN, and re-run the build script.

tamorgen commented 2 years ago

thanks, I think I'll leave good enough alone for now.

Once last question, I promise.

I took the drive that was failing out of the drive pool with the -R flag, and of course told greyhole it was still available. I added a replacement drive, and greyhole automatically started moving copies to the new drive without issue. I had to rebalance the drive, since the replacement was larger. All in all, that went smoothly. I had to give the new drive a new mount point, and once the old drive was complete, I was unable to unmount it from /etc/fstab and the server physcially.

The only issue i have now is my OCD. When I do any sort of status on the drives, either from -s or the GUI, the mount point is out of order (Drive 8, Drive6, Drive7, etc). Does it really matter, of course not.

I know I can change mount point names for each of the drives in linux ,but how does greyhole handle it? Is it strictly going off the UUID or dev name? In other words, if I remount the drives and change drive8 to drive1 in /etc/fstab, is that going to confuse greyhole?

gboudreau commented 2 years ago

Pretty sure that greyhole -s shows the drives in the order they are defined in greyhole.conf Simply change their order there, and it should look better in the Web UI and greyhole -s

tamorgen commented 2 years ago

Hey @guillaume, you were right on the greyhole.conf.

Unfortunately, I have a bigger problem, all stemming from that failed drive.

I physically removed the bad drive, and it seems to have changed the UUID on the new drive. I had to fix the mount point, but now Greyhole is not seeing that new drive. It shows it as unmounted.

Unfortunately, I tried doing the replaced flag again, because of the following warning in the logs

Feb 07 17:48:04 WARN daemon: Warning! It seems the partition UUID of /var/hda/files/drives/drive9/gh changed. This probably means this mount is currently unmounted, or that you replaced this drive and didn't use 'greyhole --replaced'. Because of that, Greyhole will NOT use this drive at this time.

Now, greyhole is also failing to start. I'm seeing errors in the logs

Feb 07 18:14:39 INFO daemon: Greyhole (version 0.15.12) daemon started.
Feb 07 18:14:39 INFO daemon: Checking MySQL tables...
Feb 07 18:14:39 WARN daemon: Warning! It seems the partition UUID of /var/hda/files/drives/drive9/gh changed. This probably means this mount is currently unmounted, or that you replaced this drive and didn't use 'greyhole --replaced'. Because of that, Greyhole will NOT use this drive at this time.
Feb 07 18:14:58 WARN daemon:   Greyhole VFS module (/usr/lib64/samba/vfs/greyhole.so) seems to be missing some required libraries. If you have issues connecting to your Greyhole-enabled shares, try to compile a new VFS module for Samba by running this command: /usr/share/greyhole/build_vfs.sh current
Feb 07 18:18:32 ERROR read_smb_spool: PHP Fatal Error: Uncaught Exception: SQLSTATE[70100]: <<Unknown error>>: 1927 Connection was killed in /usr/bin/greyhole:730
Stack trace:
#0 /usr/bin/greyhole(740): DB::execute('SELECT GET_LOCK...', Array)
#1 /usr/bin/greyhole(748): DB::getFirst('SELECT GET_LOCK...', Array)
#2 /usr/bin/greyhole(791): DB::getFirstValue('SELECT GET_LOCK...', Array)
#3 /usr/bin/greyhole(4392): DB::acquireLock('read_smb_spool', 5)
#4 /usr/bin/greyhole(7852): SambaSpool::parse_samba_spool()
#5 /usr/bin/greyhole(8572): ProcessSpoolCliRunner->run()
#6 {main}
  thrown; BT: greyhole[L730] 
Feb 07 18:25:26 INFO replaced: Storage pool drive /var/hda/files/drives/drive4/gh has been marked replaced. The Greyhole daemon will now be restarted to allow it to use this new drive.
Feb 07 18:25:46 ERROR read_smb_spool: PHP Fatal Error: Uncaught Exception: SQLSTATE[70100]: <<Unknown error>>: 1927 Connection was killed in /usr/bin/greyhole:730
Stack trace:
#0 /usr/bin/greyhole(740): DB::execute('SELECT GET_LOCK...', Array)
#1 /usr/bin/greyhole(748): DB::getFirst('SELECT GET_LOCK...', Array)
#2 /usr/bin/greyhole(791): DB::getFirstValue('SELECT GET_LOCK...', Array)
#3 /usr/bin/greyhole(4392): DB::acquireLock('read_smb_spool', 5)
#4 /usr/bin/greyhole(7852): SambaSpool::parse_samba_spool()
#5 /usr/bin/greyhole(8572): ProcessSpoolCliRunner->run()
#6 {main}
  thrown; BT: greyhole[L730] 
Feb 07 18:30:45 INFO fsck: Cleaning executed tasks: keeping the last 60 days of logs.
Feb 07 18:33:06 ERROR read_smb_spool: PHP Fatal Error: Uncaught Exception: SQLSTATE[70100]: <<Unknown error>>: 1927 Connection was killed in /usr/bin/greyhole:730
Stack trace:
#0 /usr/bin/greyhole(740): DB::execute('SELECT GET_LOCK...', Array)
#1 /usr/bin/greyhole(748): DB::getFirst('SELECT GET_LOCK...', Array)
#2 /usr/bin/greyhole(791): DB::getFirstValue('SELECT GET_LOCK...', Array)
#3 /usr/bin/greyhole(4392): DB::acquireLock('read_smb_spool', 5)
#4 /usr/bin/greyhole(7852): SambaSpool::parse_samba_spool()
#5 /usr/bin/greyhole(8572): ProcessSpoolCliRunner->run()
#6 {main}
  thrown; BT: greyhole[L730]

I've tried rebooting a few times, but it's not fixing it. I also tried reinstalling, to no avail.

Any idea what the heck is going on now?

Edit:

The other issue I keep having, is that I can't use the systmctl restart gryehole.service command, because I think smbd.service changed to smb.service in Fedora 35.

Failed to restart greyhole.service: Unit smbd.service not found.

gboudreau commented 2 years ago

What is your log level in greyhole.conf ? It should definitely be on DEBUG, to get enough details in greyhole.log

From what I can see, the daemon is trying to acquire a lock to process the samba spool, but it either doesn't work, or times out after a while because another process is working on that for a very long time. But without DEBUG logs, it's hard to see which.

Check your queue using greyhole --view-queue ; maybe your spool contains a LOT of files, and the cron that runs every minute to try to process it just isn't able to...

tamorgen commented 2 years ago

It was set to info, I just changed it to debug

It doesn't look like it's queued up for anything.

[tmorgenthaler@galactica etc]$ sudo greyhole --view-queue

Greyhole Work Queue Statistics
==============================

This table gives you the number of pending operations queued for the Greyhole daemon, per share.

               Write   Delete   Rename    Check
Bluray             0        0        0        0
DVD                0        0        0        0
Kids               0        0        0        0
Music              0        0        0        0
Other Video        0        0        0        0
TV Series          0        0        0        0
===============================================
Total              0        0        0        0
               Write   Delete   Rename    Check

The following is the number of pending operations that the Greyhole daemon still needs to parse.
Until it does, the nature of those operations is unknown.
Spooled operations that have been parsed will be listed above and disappear from the count below.

Spooled      0

gboudreau commented 2 years ago

OK, so restart the daemon, and look at the log again.

tamorgen commented 2 years ago

I did that, but no new log entries.

As I said, now the service won't start, because of the smb/smbd issue.

[tmorgenthaler@galactica etc]$ sudo systemctl restart greyhole.service Failed to restart greyhole.service: Unit smbd.service not found.

I'm not sure if there is a way to point greyhole at the correct service.

[tmorgenthaler@galactica etc]$ sudo systemctl status smb smbd.service smb.service
[tmorgenthaler@galactica etc]$ sudo systemctl status smbd.service Unit smbd.service could not be found. [tmorgenthaler@galactica etc]$ sudo systemctl status smb.service ● smb.service - Samba SMB Daemon Loaded: loaded (/usr/lib/systemd/system/smb.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2022-02-07 19:12:18 EST; 4min 28s ago Docs: man:smbd(8) man:samba(7) man:smb.conf(5) Main PID: 1454 (smbd) Status: "smbd: ready to serve connections..." Tasks: 4 (limit: 9337) Memory: 18.5M CPU: 102ms CGroup: /system.slice/smb.service ├─1454 /usr/sbin/smbd --foreground --no-process-group ├─1498 /usr/sbin/smbd --foreground --no-process-group ├─1499 /usr/sbin/smbd --foreground --no-process-group └─1514 /usr/libexec/samba/samba-bgqd --ready-signal-fd=47 --parent-watch-fd=13 --debuglevel=0 -F

Feb 07 19:12:15 galactica.starfleet.org systemd[1]: Starting Samba SMB Daemon...

tamorgen commented 2 years ago

I also tried running the bi.ly script. Not sure if it'll provide what you may need.

[tmorgenthaler@galactica ~]$ curl -Ls https://bit.ly/gh-infos | sudo sh ls: cannot access '/mnt/samba/': No such file or directory Failed to restart greyhole.service: Unit smbd.service not found. Here's the URL you will need to give the to person who's helping you: http://ix.io/3OZk

gboudreau commented 2 years ago

Greyhole only installs an init.d script, when installed using yum/dnf, and that script depends on smb, not smbd.
The systemd equivalent is only available on apt-based systems, and that requires smbd. I don't understand how you got the systemd script on Fedora..?

You can manually change that script (I have no idea where it is on your system, since I'm pretty sure it was not installed by a Greyhole package...) and replace smbd with smb, in the dependencies list.

I also really don't see how the daemon could be running at any point, if you can't restart the daemon manually..? Maybe you're using the wrong command..? Have you tried the init.d way to restart a service: service greyhole restart or /etc/init.d/greyhole restart

tamorgen commented 2 years ago

Figured out the problem. It looks like the UUID was pointing to the wrong partition, one that was referenced for aa different mount point. I must have copied the wrong value from fdisk.

It seems to be running normal again.

Thanks for looking into it.

On Feb 7, 2022, at 8:08 PM, Guillaume Boudreau @.***> wrote:

Greyhole only installs an init.d script, when installed using yum/dnf, and that script depends on smb, not smbd. The systemd equivalent is only available on apt-based systems, and that requires smbd. I don't understand how you got the systemd script on Fedora..?

You can manually change that script (I have no idea where it is on your system, since I'm pretty sure it was not installed by a Greyhole package...) and replace smbd with smb, in the dependencies list.

I also really don't see how the daemon could be running at any point, if you can't restart the daemon manually..? Maybe you're using the wrong command..? Have you tried the init.d way to restart a service: service greyhole restart or /etc/init.d/greyhole restart

— Reply to this email directly, view it on GitHub https://github.com/gboudreau/Greyhole/issues/290#issuecomment-1032107740, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADFI2LJ3HM3T7KMVEQDVTT3U2BUKTANCNFSM5NSTNYLQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.

gboudreau / Greyhole

Drive failure, now "Warning! All storage pool drives are over-capacity!" warning. #290