Unable to handle kernel paging request

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. Download my "rand2mem" benchmark (attached)
2. Attempt to run BACKING_SWAP_DEVICE=/dev/sda5 ./rand2mem.sh b11
3. Attempt to run BACKING_SWAP_DEVICE=/dev/sda5 ./rand2mem_loop.sh

I find that the kernel BUG report attached as "bug_compcache_0.6" will 
often (but not always) appear in /var/log/syslog.

What version of the product are you using? On what operating system?

Ubuntu 9.10 in a 32bit Virtual Box VM, with compache-0.6.

Please provide any additional information below.

This bug seems intermittent. Sometimes it doesn't occur. I notice the bug 
when I attempt to swap off and the swap off process freezes.

Original issue reported on code.google.com by gma...@gmail.com on 6 Nov 2009 at 5:50

Blocking: #41

Attachments:

GoogleCodeExporter commented 8 years ago

This might be related to Issue 41.

Original comment by nitingupta910@gmail.com on 12 Nov 2009 at 1:44

Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 8 years ago

I tried these scripts on VirtualBox VM (cpu: 1, ram: 512m) but could not 
trigger this
oops.

 - Is your VM configured with 1 or 2 CPUs?
 - Is 'swap notify' patch applied? For now, I tested without applying this patch.
 - Whats the size of your backing swap /dev/sda5 (around 345MB?)
 - Output of /proc/{cpuinfo,meminfo} will be helpful.
 - In general, instead of small snippet, upload full kernel log if it is not too big
(>1MB).

Original comment by nitingupta910@gmail.com on 21 Nov 2009 at 6:19

GoogleCodeExporter commented 8 years ago

Yes, I have one CPU. I am using a vanilla Ubuntu 9.10 kernel which I don't 
think has
'swap notify'. My backing swap is about 345MB, as shown by 'fdisk' (attached). 
The
attached meminfo is just something I copied from /proc/meminfo after a boot, not
after the bug occurred. I have not been able to reproduce this bug today. If 
these
look like the right files, I could try to reproduce the bug again.

Original comment by gma...@gmail.com on 21 Nov 2009 at 9:40

Attachments:

info.tar.gz

GoogleCodeExporter commented 8 years ago

I found 1 problem in compcache-0.6:

in ramzswap_make_request()
{
...

        if (bio_discard(bio))   <------ wrong: should be after valid_swap_request()
                return ramzswap_discard(rzs, bio);

        if (!valid_swap_request(rzs, bio)) {
                stat_inc(rzs->stats.invalid_io);
                bio_io_error(bio);
                return 0;
        }
...
}

At least this is corrected in repository.

I am trying to repro this on both 1 and 2 CPU VMs. In the meantime, it will be 
great
if you can also repro this issue on latest repository version, on any 
configuration.

Thanks.

Original comment by nitingupta910@gmail.com on 28 Nov 2009 at 10:42

GoogleCodeExporter commented 8 years ago

I'm afraid I can't repo on the latest repository* version. I got a bunch of 
errors 
like

[  814.272627] Buffer I/O error on device ramzswap0, logical block 25024
[  814.272706] Buffer I/O error on device ramzswap0, logical block 66

But these seem to be harmless. It was intermittent so it was possible I was 
just 
lucky. I'll tell you if it occurs again.

* hg log reports the latest change as:
$ hg log
changeset:   93:1ea72b25bae0
tag:         tip
user:        Nitin Gupta <ngupta@vflare.org>
date:        Sat Nov 14 19:48:51 2009 +0530
summary:     Set physical and logical block to PAGE_SIZE

Original comment by gma...@gmail.com on 29 Nov 2009 at 7:29

GoogleCodeExporter commented 8 years ago

> [  814.272627] Buffer I/O error on device ramzswap0, logical block 25024
> [  814.272706] Buffer I/O error on device ramzswap0, logical block 66

Do you get these specifically during swapon/swapoff or randomly during usage? 
Looks
mysterious...

Original comment by nitingupta910@gmail.com on 29 Nov 2009 at 8:18

GoogleCodeExporter commented 8 years ago

Actually, I mostly get errors on logical blocks 0-9. Here is the section in the
syslog where the higher blocks caused. This was during a long running stress 
test so
I am not sure what caused it.

Nov 28 19:25:44 xp-desktop kernel: [  790.822397] Adding 100288k swap on
/dev/ramzswap0.  Priority:100 extents:1 across:100288k SS
Nov 28 19:25:44 xp-desktop kernel: [  790.926188] Adding 2096440k swap on 
/dev/sdb1.
 Priority:-1 extents:1 across:2096440k
Nov 28 19:26:08 xp-desktop kernel: [  814.272464] __ratelimit: 23 callbacks 
suppressed
Nov 28 19:26:08 xp-desktop kernel: [  814.272490] Buffer I/O error on device
ramzswap0, logical block 25050
Nov 28 19:26:08 xp-desktop kernel: [  814.272515] Buffer I/O error on device
ramzswap0, logical block 25051
Nov 28 19:26:08 xp-desktop kernel: [  814.272529] Buffer I/O error on device
ramzswap0, logical block 25052
Nov 28 19:26:08 xp-desktop kernel: [  814.272571] Buffer I/O error on device
ramzswap0, logical block 25024
Nov 28 19:26:08 xp-desktop kernel: [  814.272588] Buffer I/O error on device
ramzswap0, logical block 25025
Nov 28 19:26:08 xp-desktop kernel: [  814.272602] Buffer I/O error on device
ramzswap0, logical block 25026
Nov 28 19:26:08 xp-desktop kernel: [  814.272615] Buffer I/O error on device
ramzswap0, logical block 25027
Nov 28 19:26:08 xp-desktop kernel: [  814.272627] Buffer I/O error on device
ramzswap0, logical block 25024
Nov 28 19:26:08 xp-desktop kernel: [  814.272706] Buffer I/O error on device
ramzswap0, logical block 66
Nov 28 19:26:08 xp-desktop kernel: [  814.272721] Buffer I/O error on device
ramzswap0, logical block 67
Nov 28 19:26:09 xp-desktop kernel: [  815.315801] ramzswap: Disk size set to 
152956 kB
Nov 28 19:26:09 xp-desktop kernel: [  815.449454] Adding 152948k swap on
/dev/ramzswap0.  Priority:100 extents:1 across:152948k SS
Nov 28 19:26:09 xp-desktop kernel: [  815.551801] Adding 2096440k swap on 
/dev/sdb1.
 Priority:-1 extents:1 across:2096440k

Original comment by gma...@gmail.com on 29 Nov 2009 at 8:35

Attachments:

syslog.1.gz

GoogleCodeExporter commented 8 years ago

Ah. I think I can reproduce the original crash on the most recent compcache. 
Syslog
attached.

Original comment by gma...@gmail.com on 29 Nov 2009 at 8:39

Attachments:

syslog.gz

GoogleCodeExporter commented 8 years ago

Now I can also repro the issue using rand2mem (cpu=1, mem=256m, backing_swap=2G
partition).

Original comment by nitingupta910@gmail.com on 29 Nov 2009 at 2:54

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

This bug is happening due to a race between read and reset device.
When reset is issued, we can still have some in-flight I/Os. However,
we are not making sure that the device is quiesced before freeing
all the data structures. This is causing use-after-free bug for I/Os
that arrive while we are freeing data in parallel.

A fix for this has been checked-in but it still needs to be refined
a bit. Some more testing is needed too!

====================

Debugging this issue is a bit involved so I'm dumping all the details.
This should help us debug similar problems in future. Some steps could
be skipped but I will go extra verbose so others can also do this debugging.

---

Compiling the module with optimizations disabled (-O0) reveals location
of BUG:

Dec  9 18:27:07 localhost kernel: BUG: unable to handle kernel NULL pointer
dereference at 00000217
Dec  9 18:27:07 localhost kernel: IP: [<d0dc32d0>] rzs_test_flag+0x24/0x44
[ramzswap]      <---------
Dec  9 18:27:07 localhost kernel: *pdpt = 000000000e472001 *pde = 
000000000e47f067
*pte = 0000000000000000 
Dec  9 18:27:07 localhost kernel: Oops: 0000 [#1] SMP 
Dec  9 18:27:07 localhost kernel: last sysfs file:
/sys/devices/virtual/block/ramzswap0/removable
Dec  9 18:27:07 localhost kernel: Modules linked in: ramzswap lzo_decompress
lzo_compress sit tunnel4 sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 dm_multipath vboxvfs snd_intel8x0 snd_ac97_codec ac97_bus 
snd_seq
snd_seq_device ppdev snd_pcm parport_pc snd_timer snd parport pcnet32 soundcore
snd_page_alloc i2c_piix4 i2c_core vboxadd mii [last unloaded: ramzswap]
Dec  9 18:27:07 localhost kernel:
Dec  9 18:27:07 localhost kernel: Pid: 2649, comm: blkid Not tainted
(2.6.31.6-162.fc12.i686.PAE #1) VirtualBox
Dec  9 18:27:07 localhost kernel: EIP: 0060:[<d0dc32d0>] EFLAGS: 00010202 CPU: 0
Dec  9 18:27:07 localhost kernel: EIP is at rzs_test_flag+0x24/0x44 [ramzswap]
Dec  9 18:27:07 localhost kernel: EAX: 00000210 EBX: ce420780 ECX: 00000001 
EDX: 00000210
Dec  9 18:27:07 localhost kernel: ESI: cf880e70 EDI: 000000ff EBP: ce51bc60 
ESP: ce51bc4c
Dec  9 18:27:07 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Dec  9 18:27:07 localhost kernel: Process blkid (pid: 2649, ti=ce51a000 
task=ce430cc0
task.ti=ce51a000)
Dec  9 18:27:07 localhost kernel: Stack:
Dec  9 18:27:07 localhost kernel: 00000001 00000042 cd4c5600 ce420780 cf880e70
ce51bca4 d0dc5080 ce51bc84
Dec  9 18:27:07 localhost kernel: <0> c0588dbe ce51bc88 ce420780 cd4c5600 
cf58e898
00001000 00000210 c058f74f
Dec  9 18:27:07 localhost kernel: <0> 00000000 c11cb2e0 cd4c5600 00000042 
cf880e70
ce420780 ce51bcbc d0dc588c
Dec  9 18:27:07 localhost kernel: Call Trace:
Dec  9 18:27:07 localhost kernel: [<d0dc5080>] ? ramzswap_read+0x57/0x1cd 
[ramzswap]
Dec  9 18:27:07 localhost kernel: [<c0588dbe>] ? __elv_add_request+0x92/0x99
Dec  9 18:27:07 localhost kernel: [<c058f74f>] ? __make_request+0x2d2/0x345
Dec  9 18:27:07 localhost kernel: [<d0dc588c>] ? ramzswap_make_request+0xa4/0xbc
[ramzswap]
Dec  9 18:27:07 localhost kernel: [<c058d664>] ? 
generic_make_request+0x214/0x261
Dec  9 18:27:07 localhost kernel: [<c049d379>] ? mempool_alloc_slab+0x13/0x15
Dec  9 18:27:07 localhost kernel: [<c049d586>] ? mempool_alloc+0x50/0xe9
<snip>

cat /proc/modules | grep ramzswap
ramzswap 28644 2 - Live 0xd0dc3000   <---- ramzswap module base address
lzo_decompress 2248 1 ramzswap, Live 0xd0d11000
lzo_compress 1944 1 ramzswap, Live 0xd0d0b000

disassemble: objdump -S -D --adjust-vma=0xd0dc3000 ramzswap.ko > ramzswap.S

BUG was caused at EIP 0xd0dc32d0 (from BUG report).
Dump nearby code area:

static int rzs_test_flag(struct ramzswap *rzs, u32 index,
            enum rzs_pageflags flag)
{
d0dc32ac:   55                      push   %ebp
d0dc32ad:   89 e5                   mov    %esp,%ebp
d0dc32af:   56                      push   %esi
d0dc32b0:   53                      push   %ebx
d0dc32b1:   83 ec 0c                sub    $0xc,%esp
d0dc32b4:   e8 fc ff ff ff          call   d0dc32b5 <rzs_test_flag+0x9>
d0dc32b9:   89 45 f4                mov    %eax,-0xc(%ebp)      <--- arg1: rzs
d0dc32bc:   89 55 f0                mov    %edx,-0x10(%ebp)     <--- arg2: index
d0dc32bf:   89 4d ec                mov    %ecx,-0x14(%ebp)     <--- arg3: flag
    return rzs->table[index].flags & BIT(flag);
d0dc32c2:   8b 45 f4                mov    -0xc(%ebp),%eax  // EAX = rzs 
d0dc32c5:   8b 40 0c                mov    0xc(%eax),%eax   // EAX = rzs->table 
('table' is
at 12th offset of struct ramzswap)
d0dc32c8:   8b 55 f0                mov    -0x10(%ebp),%edx // EDX = index = 0x42
(decoded below)
d0dc32cb:   c1 e2 03                shl    $0x3,%edx    // EDX = index * (1 << 3) (each
table entry has size of 8b)
                                    = 0x42 * 8 = 0x210
d0dc32ce:   01 d0                   add    %edx,%eax    // EAX = EDX + EAX (rzs->table 
+
index*8 = 0x210)
d0dc32d0:   0f b6 40 07             movzbl 0x7(%eax),%eax   // EAX = *(EAX + 0x7) 
('flags'
is at 7th offset of struct table)
                                     ^ BUG here!
                                    = *(rzs->table + index*8 + 0x7) = *(rzs->table + 0x210)
d0dc32d4:   0f b6 d0                movzbl %al,%edx
d0dc32d7:   8b 45 ec                mov    -0x14(%ebp),%eax

Getting values for ramzswap_test_flag() args:
EBP=ce51bc60, ESP=0xce51bc4c (from BUG report)
-0x14(%ebp) == 0xce420780 - 0x14 == 0xce51bc4c = ESP

So, Stack dump in the BUG report gives us the args (in reverse order):
    flag = 0x1 (RZS_ZERO)
    index = 0x42
    rzs = 0xcd4c5600

From disasm above, we can see that rzs->table must be 0 to make final address
in EAX (rzs->table + index*8 + 0x7) equal to 0x217 which the BUG report shows.
This confirms our suspicion that its a race between read and reset device (which
frees this table).

---

The solution is to:
 - reset init_done to 0 *before* freeing all the data structures.
 - wait for all pending I/O on this ramzswap device to finish.
 - After this wait, if any further I/O is queued, the rzs device will simply reject
   it since init_done check will now fail in ramzswap_make_request() -- till the next
init.

Original comment by nitingupta910@gmail.com on 10 Dec 2009 at 2:46

GoogleCodeExporter commented 8 years ago

Can you please test this fix.

Original comment by nitingupta910@gmail.com on 15 Dec 2009 at 5:27

GoogleCodeExporter commented 8 years ago

I guess you mean the fix in hg? I still get Buffer I/O errors on logical blocks 
0-9, 
but I am not getting any kernel bugs.

Original comment by gma...@gmail.com on 15 Dec 2009 at 6:13

GoogleCodeExporter commented 8 years ago

yes, the fix is in hg.

> I still get Buffer I/O errors on logical blocks 0-9

do you get these specifically during swapon only?

These should be harmless and think they are due to disk readahead stuff. I 
never get
these errors on my side so can you please provide details about your setup?

Original comment by nitingupta910@gmail.com on 15 Dec 2009 at 6:18

GoogleCodeExporter commented 8 years ago

Not sure when it happens. I added logging to my script, but then it stopped 
happening, presumably it only occurs if one operation happens immediately after 
the 
other. Does this dump from dmesg help?

[  578.284149] ISO 9660 Extensions: RRIP_1991A
[  700.937862] ramzswap: Creating 1 devices ...
[  701.918331] Adding 2096440k swap on /dev/sdb1.  Priority:-1 extents:1 
across:2096440k
[  740.684180] ramzswap: Disk size set to 20059 kB
[  740.821390] Adding 20048k swap on /dev/ramzswap0.  Priority:100 extents:1 
across:20048k SS
[  740.888751] Adding 2096440k swap on /dev/sdb1.  Priority:-1 extents:1 
across:2096440k
[  762.255004] Buffer I/O error on device ramzswap0, logical block 0
[  762.255036] Buffer I/O error on device ramzswap0, logical block 1
[  762.255058] Buffer I/O error on device ramzswap0, logical block 2
[  762.255079] Buffer I/O error on device ramzswap0, logical block 3
[  762.255099] Buffer I/O error on device ramzswap0, logical block 4
[  762.255119] Buffer I/O error on device ramzswap0, logical block 5
[  762.255140] Buffer I/O error on device ramzswap0, logical block 6
[  762.255161] Buffer I/O error on device ramzswap0, logical block 7
[  762.255181] Buffer I/O error on device ramzswap0, logical block 8
[  762.255201] Buffer I/O error on device ramzswap0, logical block 9
[  770.219911] ramzswap: Disk size set to 52657 kB
[  770.341297] Adding 52648k swap on /dev/ramzswap0.  Priority:100 extents:1 
across:52648k SS
[  770.391994] Adding 2096440k swap on /dev/sdb1.  Priority:-1 extents:1 
across:2096440k

I am not sure what you wanted from my setup, but I have attached some details 
on my 
VM (Ubuntu 9.10, uniprocessor, 2.6.31-14-generic). No swapfree patches have 
been 
applied.

Original comment by gma...@gmail.com on 15 Dec 2009 at 8:53

Attachments:

setup
dmsg

GoogleCodeExporter commented 8 years ago

> presumably it only occurs if one operation happens immediately after the 
other. 

Looks like you are correct. I just committed a change that should prevent these 
I/O
errors: now we sync disk and then set it to unintialized (we were doing the 
reverse
earlier).

Can you please test it again? I will try to get similar setup on my side too. 
The
information you provided should be sufficient.

Thanks.

Original comment by nitingupta910@gmail.com on 15 Dec 2009 at 9:36

GoogleCodeExporter commented 8 years ago

I appear to still get the I/O errors. See attached dmesg.2.

Original comment by gma...@gmail.com on 15 Dec 2009 at 12:03

Attachments:

dmesg.2

GoogleCodeExporter commented 8 years ago

The original issue of kernel crash is solved. So, closing the issue.

> I appear to still get the I/O errors. See attached dmesg.2.
These errors look harmless and appear to be due to readahead during device open.

Original comment by nitingupta910@gmail.com on 12 Jan 2010 at 1:34

Changed state: Fixed

aaron6187 / compcache

Unable to handle kernel paging request #45