cache swap.file rebuild problem

GoogleCodeExporter commented 9 years ago


it appears an old issue is back 
 after unclean shutdown the rebuild process gets crazy, relevant 
 cache_log entries: 

2010/03/04 14:57:25| WARNING: newer swaplog entry for dirno 0, fileno 
 000478A7 
 2010/03/04 14:57:28| WARNING: newer swaplog entry for dirno 1, fileno 
 00022869 
 2010/03/04 14:57:28| Store rebuilding is 69.8% complete 
 2010/03/04 14:57:31| WARNING: newer swaplog entry for dirno 0, fileno 
 00046056 
 ... 
 2010/03/04 14:57:41| WARNING: newer swaplog entry for dirno 0, fileno 
 00044D82 
 2010/03/04 14:57:41| WARNING: newer swaplog entry for dirno 0, fileno 
 00044D82 
 .. 
 2010/03/04 14:57:43| Store rebuilding is 70.2% complete 
 ... 
 2010/03/04 14:57:58| Store rebuilding is -2.1% complete 
 ... 
 2010/03/04 15:00:03| Store rebuilding is -1.6% complete 
 .., 
 2010/03/04 15:06:06| Store rebuilding is -1.5% complete 
 2010/03/04 15:06:21| Store rebuilding is -1.6% complete 
 ... 
 2010/03/04 15:39:38| Store rebuilding is 150.3% complete 
 ... 
 2010/03/04 15:39:53| Store rebuilding is 147.4% complete 
 ... 
 2010/03/04 15:55:02| Store rebuilding is 111.0% complete 
 2010/03/04 15:55:18| Store rebuilding is 110.8% complete 
 ... 
 2010/03/04 15:59:22| Store rebuilding is -3.4% complete 
 ... 
 2010/03/04 16:27:09| WARNING: newer swaplog entry for dirno 0, fileno 
 00045E94 
 2010/03/04 16:27:12| WARNING: newer swaplog entry for dirno 1, fileno 
 00030498 
 2010/03/04 16:27:15| diskHandleWrite: FD 75: disk write error: (28) No 
 space left on device 
 2010/03/04 16:27:15| write failure! 
 2010/03/04 16:27:15| assertion failed: disk.c:299: "1 == 0" 

when squid dies with a full cache_dir partition, it is the new 
 swap.state which grows endless 

it is FreeBSD-8.0-STABLE amd64 and latst lusca_head snapshot

H

Original issue reported on code.google.com by hm@hm.net.br on 5 Mar 2010 at 6:47

GoogleCodeExporter commented 9 years ago


For better understanding I suggest a small change in store_rebuild.c 

    debug(20, 1) ("Store rebuilding is %4.1f%% complete\n", 100.0 * n / d);

what is missing is an indicater which cachedir this entry is about

I would suggest cache_dir name or the index number?

H

Original comment by hm@hm.net.br on 5 Mar 2010 at 9:45

GoogleCodeExporter commented 9 years ago

Try this?

Index: store_rebuild.c
===================================================================
--- store_rebuild.c     (revision 14463)
+++ store_rebuild.c     (working copy)
@@ -194,8 +194,13 @@
     if (squid_curtime - last_report < 15)
        return;
     for (sd_index = 0; sd_index < Config.cacheSwap.n_configured; sd_index++) {
+       double nn, dd;
+       SwapDir *sd = INDEXSD(sd_index);
        n += (double) RebuildProgress[sd_index].scanned;
        d += (double) RebuildProgress[sd_index].total;
+       nn = (double) RebuildProgress[sd_index].scanned;
+       dd = (double) RebuildProgress[sd_index].total;
+        debug(20, 1) ("Store rebuilding in dir %s is %4.1f%% complete\n", 
sd->path, 100.0 * nn / dd);
     }
     debug(20, 1) ("Store rebuilding is %4.1f%% complete\n", 100.0 * n / d);
     last_report = squid_curtime;

Original comment by adrian.c...@gmail.com on 20 Mar 2010 at 8:50

GoogleCodeExporter commented 9 years ago

good starting point, I am patching my servers today and then let's see what I 
can 
catch

I think this msg should go to stdout/stderr

Original comment by michel.s...@gmail.com on 20 Mar 2010 at 9:56

GoogleCodeExporter commented 9 years ago

The problem seems to be when reconfigure is called during a rebuild. The tmp 
swaplog is closed and the normal 
swaplog is opened. This causes the rebuild process to never finish reading 
objects as it's continuously appending 
items to swap.state.

check mainReconfigure() and the swapDir close/open functions.

Original comment by adrian.c...@gmail.com on 23 Mar 2010 at 9:11

GoogleCodeExporter commented 9 years ago

Original comment by adrian.c...@gmail.com on 23 Mar 2010 at 9:14

Changed state: Accepted
Added labels: Priority-Critical, Version-Head
Removed labels: Priority-Medium, Version-1.0

GoogleCodeExporter commented 9 years ago

Hopefully fixed in r14477! Please test!

Original comment by adrian.c...@gmail.com on 23 Mar 2010 at 12:49

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

No wait - this one isn't fixed yet!

Original comment by adrian.c...@gmail.com on 23 Mar 2010 at 12:53

Changed state: Started

GoogleCodeExporter commented 9 years ago

Try this patch!

Original comment by adrian.c...@gmail.com on 23 Mar 2010 at 10:42

Attachments:

store_dir.c.diff

GoogleCodeExporter commented 9 years ago

Fixed in r14478. Please test!

Original comment by adrian.c...@gmail.com on 24 Mar 2010 at 4:09

GoogleCodeExporter commented 9 years ago

For completeness, would you please verify that the latest snapshot works for 
you and
let me know here?

I'd like to mark this issue as resolved if possible. ;))

Thanks!

Original comment by adrian.c...@gmail.com on 26 Mar 2010 at 1:59

GoogleCodeExporter commented 9 years ago

you know, I'm testing svn, this night I deployed on some production servers 
since 
the test machine was stable. I can not switch to to the snapshot back now it is 
too 
much work to do that 
I am confident that it is solved now but would say: let's be patient and run a 
week 
or so to see

Original comment by hm@hm.net.br on 26 Mar 2010 at 3:26

GoogleCodeExporter commented 9 years ago

to complete ...
this patch is not in svn right, anyway I do not like this solution because it 
is 
kind of "doing reconfigure but not doing it" 
IMO reconfigure shouldn't execute at all while rebuild in progress and my 
former 
suggestion of checking the condition in main.c is better and straight forward, 
I 
hacked it as follows:

mainReconfigure(void)
{
   if (store_dirs_rebuilding > 1) {
        fprintf(stderr, "Squid -k reconfigure not allowed at this moment, swap 
rebuild still in progrress\n");
        exit(0);
   }

so the user knows and understand he has to do it later

Original comment by hm@hm.net.br on 26 Mar 2010 at 3:37

GoogleCodeExporter commented 9 years ago

Well, there's no real reason that reconfigure should close/open the swaplogs 
during
the rebuild. The only thing that it'll screw up is if the user changes where the
swaplog is stored on disk, and that's risky anyway whilst the cache runs.

Anyway - I'm much more interested right now in whether my solution works. I'd 
rather
find the -cause- of the problem instead of just committing a band-aid solution
without understanding the true causes. :)

Please let me know ASAP if the SVN checkout you're using hasn't completely 
solved the
rebuild issues and I'll continue investigating.

Original comment by adrian.c...@gmail.com on 26 Mar 2010 at 3:41

GoogleCodeExporter commented 9 years ago

Feedback from lusca-users: it works! Yay!

Original comment by adrian.c...@gmail.com on 29 Mar 2010 at 9:52

Changed state: Fixed

google-code-export / lusca-cache

cache swap.file rebuild problem #91