gboudreau / Greyhole

Greyhole uses Samba to create a storage pool of all your available hard drives, and allows you to create redundant copies of the files you store.
http://www.greyhole.net
GNU General Public License v3.0
262 stars 34 forks source link

Feature Request: Make Read_Smb_Spool a separate task running concurrent #328

Open sabatech opened 3 months ago

sabatech commented 3 months ago

I find that my system pauses for very long times during some actions on my greyhole server. I was wondering if it's possible to make the Read_Smb_Spool a separately run task so that greyhole can continue processing files while the spool is being processed. Maybe make the SMB spool action is it's own daemon or something.

Last logged action: read_smb_spool on 2024-06-10 20:27:16 (1h 4m 8s ago)

sabatech commented 3 months ago

Last logged action: read_smb_spool on 2024-06-10 20:27:16 (3h 13m 3s ago)

sabatech commented 3 months ago

and here I am a few hours later, and it finally finished its read_smb_spool, and it's back to processing the spool AGAIN, instead of processing files, so file copies are getting backed up and waiting instead of it actually managing the storage copies, I want it to keep moving data not spending hours processing the spool while waiting to process files.

sabatech commented 3 months ago

Last logged action: read_smb_spool on 2024-06-11 04:28:08 (29s ago)

sabatech commented 3 months ago

I think separating the spool process vs the file handling process would be an incredible performance boost on top of everything else.

sabatech commented 3 months ago

everything seems to just wait on the smb spool handling. and if sufficient spool stuff has happened, the spool can be the bottleneck

sabatech commented 3 months ago

it seems like all it wants to do is process the spool rather than process the files, and its creating a huge log of stuff to do, but not actually doing it

gboudreau commented 3 months ago

Greyhole is trying to process the spool often because it is trying to prevent what is happening to you right now: having too many spooled operations that it takes a very long time to simply list and order them correctly, each time it needs to do so (before moving them from files in the spool folder to rows in MySQL). You're now at a point where you have soo much behind, in spool processing, with probably millions of operations spooled, that a simple ls (I think it uses find really) and sort takes forever.

Your suggestion might be beneficial in some very specific situations, but is not that simple to implement.

To resolve your current situation, you can try to lower the value of max_queued_tasks to something lower than the default 10,000,000). This limits the number of rows inserted in MySQL, so once you've reached this limit, the spool processor will NOT do anything, and the daemon will instead work on file operations until the number of queued tasks in MySQL goes below this number. Look in MySQL for the number of rows in the tasks table, and configure greyhole.conf with a number much lower than that.

Do you know why you have so many file operations? If you do, and for example, you are adding a lot of files into your Greyhole pool through Samba in a specific share or folder, then maybe you can just delete your complete spool folder, re-create it (greyhole --create-mem-spool), and once you're done copying files, just use a greyhole --fsck to handle the new/changed files.