Accidentally Copied Lots of Small Files - Recovering from a Blown Out log and Spool.

old-square-eyes commented 10 months ago

I backed up an old disk via SAMBA and didn't realise it had a couple hundred thousand tiny files on it (uncompressed website backups, including dependencies).

The GH log filled up the drive, and SAMBA spool too. I cleared the log and ran the following per instructions online for being out of iNodes...

mv /var/spool/greyhole /var/spool/greyhole.bak mkdir -p /var/spool/greyhole chmod 777 /var/spool/greyhole /usr/bin/greyhole --create-mem-spool

Before I started GH; I also deleted all GH pool Dir of the backup, and did the same with the LZ.

However, there must have been queued tasks or a record (DB metadata?) somewhere because it's incrementing through tens of thousands of these files in the log. Not entirely unexpected.

The only issue is it's happening incredibly slowly. Is this a normal FSCK? Or is it processing some spool/DB before it gets to the normal FSCK? Why is it an order of magnitude slower than a normal FSCK?

Would love some help. Part of me feels like I should blow away the DB and do a standard FSCK otherwise this could take weeks.

Nov 09 20:54:01 INFO fsck_file: Now working on task ID 1500441: fsck_file some_file,
Nov 09 20:54:01 DEBUG fsck_file: Loading metafiles for some_file,
Nov 09 20:54:00 DEBUG fsck_file:   Got 0 metadata files.
Nov 09 20:54:00 WARN fsck_file:   WARNING! No copies of this file are available in the Greyhole storage pool: "some_file,". 
Nov 09 20:54:00 DEBUG fsck_file:   Removing metadata files for some_file,

old-square-eyes commented 10 months ago

MariaDB [greyhole]> SELECT COUNT() FROM tasks; +----------+ | COUNT() | +----------+ | 924131 | +----------+

78 tasks a minute = 8 days to clear?

old-square-eyes commented 10 months ago

greyhole -C
All scheduled fsck tasks have now been deleted.
Specific files checks might have been queued for problematic files, and those (fsck_file) tasks will still be executed, once other tasks have been processed.

I guess these are the "Special Files Checks" since it continues.

gboudreau commented 10 months ago

Stop the daemon, then look for and delete the fsck_file tasks queued:

(Replace MyBackups by your share name, and something/else by the folder that contained those files)

select count(*) from tasks where action = 'fsck_file' and share = 'MyBackups' and full_path LIKE 'something/else/%';
DELETE from tasks where action = 'fsck_file' and share = 'MyBackups' and full_path LIKE 'something/else/%';

You will also need to delete this folder (and all sub-folders) from the Greyhoel metadata folders, or you will get errors on each full fsck. Look in each of your storage pool drives, in the folders .gh_metastore and .gh_metastore_backup, and nuke the folders ShareName/something/else you can find there. If your shared drives are all together, you can run something like this:

ls -la /mnt/hdd*/gh/.gh_metastore*/ShareName/something/else/
# Then if you are sure it would delete the correct folders:
rm -rf /mnt/hdd*/gh/.gh_metastore*/ShareName/something/else/

Then you can restart the daemon.

Cheers.

old-square-eyes commented 10 months ago

Great - lifesaver. Did all that and the service starts (allows me to start). But tailing the log and nothing is happening. Allowed me to start a FSCK and also nothing happens.

old-square-eyes commented 10 months ago

Actually disregard. It's working I just had the log level set to WARN and couldn't see the BAU logs. Thank you.

gboudreau / Greyhole

Accidentally Copied Lots of Small Files - Recovering from a Blown Out log and Spool. #320