Aborres / compcache

Automatically exported from code.google.com/p/compcache
1 stars 0 forks source link

Random crash #82

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. compile latest source (hg)

2.
#modprobe zram num_devices=2
#echo 524288000 > /sys/block/zram0/disksize
#echo 524288000 > /sys/block/zram1/disksize
#mkswap /dev/zram0
#mkswap /dev/zram1
#swapon /dev/zram0 -p 100
#swapon /dev/zram1 -p 100

3.
run some virtual machines, eclipse, at some random time i got segmentation fault

What version of the product are you using? On what operating system?

Linux onyx.local 2.6.36-02063602-generic #201012101121 SMP Fri Dec 10 11:26:48 
UTC 2010 x86_64 GNU/Linux

latest from hg

Please provide any additional information below.

Jan  3 15:04:21 onyx kernel: [19200.810041] kswapd0       D ffff880002114cc0    
 0    28      2 0x00000000
Jan  3 15:04:21 onyx kernel: [19200.810048]  ffff88007c03f790 0000000000000046 
ffff88007c03ffd8 0000000000014cc0
Jan  3 15:04:21 onyx kernel: [19200.810053]  0000000000014cc0 ffff88007c03ffd8 
0000000000014cc0 ffff88007c03ffd8
Jan  3 15:04:21 onyx kernel: [19200.810057]  0000000000014cc0 ffff88007c91df18 
ffff88007c91df20 ffff88007c91db80
Jan  3 15:04:21 onyx kernel: [19200.810063] Call Trace:
Jan  3 15:04:21 onyx kernel: [19200.810076]  [<ffffffff8158bdf3>] 
io_schedule+0x73/0xc0
Jan  3 15:04:21 onyx kernel: [19200.810083]  [<ffffffff812a81f0>] 
get_request_wait+0xd0/0x170
Jan  3 15:04:21 onyx kernel: [19200.810088]  [<ffffffff812b7316>] ? 
cfq_find_rq_fmerge+0x66/0x70
Jan  3 15:04:21 onyx kernel: [19200.810095]  [<ffffffff810831b0>] ? 
autoremove_wake_function+0x0/0x40
Jan  3 15:04:21 onyx kernel: [19200.810099]  [<ffffffff812a234b>] ? 
elv_merge+0xfb/0x110
Jan  3 15:04:21 onyx kernel: [19200.810102]  [<ffffffff812a831c>] 
__make_request+0x8c/0x4b0
Jan  3 15:04:21 onyx kernel: [19200.810108]  [<ffffffff81037d59>] ? 
default_spin_lock_flags+0x9/0x10
Jan  3 15:04:21 onyx kernel: [19200.810113]  [<ffffffff8158ddd4>] ? 
_raw_spin_lock_irqsave+0x34/0x50
Jan  3 15:04:21 onyx kernel: [19200.810117]  [<ffffffff812a69ce>] 
generic_make_request+0x20e/0x430
Jan  3 15:04:21 onyx kernel: [19200.810123]  [<ffffffff8110642d>] ? 
mempool_alloc+0x6d/0x140
Jan  3 15:04:21 onyx kernel: [19200.810127]  [<ffffffff812a6c6d>] 
submit_bio+0x7d/0x100
Jan  3 15:04:21 onyx kernel: [19200.810134]  [<ffffffff81180183>] 
submit_bh+0xf3/0x140
Jan  3 15:04:21 onyx kernel: [19200.810138]  [<ffffffff81181f63>] 
__block_write_full_page+0x1d3/0x350
Jan  3 15:04:21 onyx kernel: [19200.810142]  [<ffffffff811811f0>] ? 
end_buffer_async_write+0x0/0x170
Jan  3 15:04:21 onyx kernel: [19200.810147]  [<ffffffff811811f0>] ? 
end_buffer_async_write+0x0/0x170
Jan  3 15:04:21 onyx kernel: [19200.810151]  [<ffffffff81182406>] 
block_write_full_page_endio+0x116/0x120
Jan  3 15:04:21 onyx kernel: [19200.810155]  [<ffffffff81182425>] 
block_write_full_page+0x15/0x20
Jan  3 15:04:21 onyx kernel: [19200.810159]  [<ffffffff811cb95a>] 
ext3_ordered_writepage+0x1da/0x220
Jan  3 15:04:21 onyx kernel: [19200.810164]  [<ffffffff81112e92>] 
pageout+0xd2/0x210
Jan  3 15:04:21 onyx kernel: [19200.810167]  [<ffffffff81114268>] 
shrink_page_list+0x348/0x490
Jan  3 15:04:21 onyx kernel: [19200.810171]  [<ffffffff811144d4>] 
shrink_inactive_list+0x124/0x2e0
Jan  3 15:04:21 onyx kernel: [19200.810175]  [<ffffffff811146e6>] 
shrink_list+0x56/0xa0
Jan  3 15:04:21 onyx kernel: [19200.810178]  [<ffffffff81114838>] 
shrink_zone+0x108/0x110
Jan  3 15:04:21 onyx kernel: [19200.810182]  [<ffffffff811087da>] ? 
zone_watermark_ok+0x2a/0xf0
Jan  3 15:04:21 onyx kernel: [19200.810186]  [<ffffffff81114e84>] 
balance_pgdat+0x3a4/0x430
Jan  3 15:04:21 onyx kernel: [19200.810189]  [<ffffffff81115032>] 
kswapd+0x122/0x2a0
Jan  3 15:04:21 onyx kernel: [19200.810193]  [<ffffffff8158ba39>] ? 
schedule+0x309/0x650
Jan  3 15:04:21 onyx kernel: [19200.810197]  [<ffffffff810831b0>] ? 
autoremove_wake_function+0x0/0x40
Jan  3 15:04:21 onyx kernel: [19200.810200]  [<ffffffff81114f10>] ? 
kswapd+0x0/0x2a0
Jan  3 15:04:21 onyx kernel: [19200.810204]  [<ffffffff81082ad7>] 
kthread+0x97/0xa0
Jan  3 15:04:21 onyx kernel: [19200.810209]  [<ffffffff8100bee4>] 
kernel_thread_helper+0x4/0x10
Jan  3 15:04:21 onyx kernel: [19200.810212]  [<ffffffff81082a40>] ? 
kthread+0x0/0xa0
Jan  3 15:04:21 onyx kernel: [19200.810216]  [<ffffffff8100bee0>] ? 
kernel_thread_helper+0x0/0x10

Jan  3 15:04:21 onyx kernel: [19200.810287] kjournald     D ffff880002014cc0    
 0 11213      2 0x00000000
Jan  3 15:04:21 onyx kernel: [19200.810292]  ffff8800370c1ac0 0000000000000046 
ffff8800370c1fd8 0000000000014cc0
Jan  3 15:04:21 onyx kernel: [19200.810297]  0000000000014cc0 ffff8800370c1fd8 
0000000000014cc0 ffff8800370c1fd8
Jan  3 15:04:21 onyx kernel: [19200.810301]  0000000000014cc0 ffff880070053158 
ffff880070053160 ffff880070052dc0
Jan  3 15:04:21 onyx kernel: [19200.810306] Call Trace:
Jan  3 15:04:21 onyx kernel: [19200.810310]  [<ffffffff8158bdf3>] 
io_schedule+0x73/0xc0
Jan  3 15:04:21 onyx kernel: [19200.810315]  [<ffffffff812a81f0>] 
get_request_wait+0xd0/0x170
Jan  3 15:04:21 onyx kernel: [19200.810318]  [<ffffffff812b7316>] ? 
cfq_find_rq_fmerge+0x66/0x70
Jan  3 15:04:21 onyx kernel: [19200.810323]  [<ffffffff810831b0>] ? 
autoremove_wake_function+0x0/0x40
Jan  3 15:04:21 onyx kernel: [19200.810326]  [<ffffffff812a234b>] ? 
elv_merge+0xfb/0x110
Jan  3 15:04:21 onyx kernel: [19200.810330]  [<ffffffff812a831c>] 
__make_request+0x8c/0x4b0
Jan  3 15:04:21 onyx kernel: [19200.810334]  [<ffffffff811cdf79>] ? 
ext3_get_block+0x79/0x120
Jan  3 15:04:21 onyx kernel: [19200.810338]  [<ffffffff812a69ce>] 
generic_make_request+0x20e/0x430
Jan  3 15:04:21 onyx kernel: [19200.810342]  [<ffffffff8110642d>] ? 
mempool_alloc+0x6d/0x140
Jan  3 15:04:21 onyx kernel: [19200.810346]  [<ffffffff812a6c6d>] 
submit_bio+0x7d/0x100
Jan  3 15:04:21 onyx kernel: [19200.810350]  [<ffffffff81180183>] 
submit_bh+0xf3/0x140
Jan  3 15:04:21 onyx kernel: [19200.810355]  [<ffffffff8122499f>] 
journal_commit_transaction+0x8ff/0xe20
Jan  3 15:04:21 onyx kernel: [19200.810360]  [<ffffffff8100989b>] ? 
__switch_to+0xbb/0x2e0
Jan  3 15:04:21 onyx kernel: [19200.810366]  [<ffffffff81070dab>] ? 
lock_timer_base+0x3b/0x70
Jan  3 15:04:21 onyx kernel: [19200.810369]  [<ffffffff810725c1>] ? 
try_to_del_timer_sync+0x51/0xe0
Jan  3 15:04:21 onyx kernel: [19200.810375]  [<ffffffff81227dea>] 
kjournald+0xea/0x260
Jan  3 15:04:21 onyx kernel: [19200.810379]  [<ffffffff810831b0>] ? 
autoremove_wake_function+0x0/0x40
Jan  3 15:04:21 onyx kernel: [19200.810383]  [<ffffffff81227d00>] ? 
kjournald+0x0/0x260
Jan  3 15:04:21 onyx kernel: [19200.810386]  [<ffffffff81082ad7>] 
kthread+0x97/0xa0
Jan  3 15:04:21 onyx kernel: [19200.810391]  [<ffffffff8100bee4>] 
kernel_thread_helper+0x4/0x10
Jan  3 15:04:21 onyx kernel: [19200.810394]  [<ffffffff81082a40>] ? 
kthread+0x0/0xa0
Jan  3 15:04:21 onyx kernel: [19200.810397]  [<ffffffff8100bee0>] ? 
kernel_thread_helper+0x0/0x10

thanks

Original issue reported on code.google.com by superbiji on 11 Jan 2011 at 8:30

GoogleCodeExporter commented 8 years ago
the attached code will show data corruption in the zram device:

after a few cycles you will get e.g.:
cycle 8:
536184  tmpmnt
62398464
tmpmnt/tmpfile zram0mnt/f differ: char 4097, line 64

warning: this code may lead to oom conditions and therefore crash your system

Original comment by fadb24bb...@drewag.de on 21 Jan 2011 at 5:34

Attachments:

GoogleCodeExporter commented 8 years ago
I can now reproduce the issue using your script. Thanks. - Nitin

Original comment by nitingupta910@gmail.com on 22 Jan 2011 at 1:17

GoogleCodeExporter commented 8 years ago
I just sync'ed code in the repository with the mainline version (as in 2.6.37). 
It does not contain many of the scalability improvements present in the version 
you tested but this mainline version is more stable (I hope!).

So, can you please pull again and give it a try?

Original comment by nitingupta910@gmail.com on 22 Jan 2011 at 4:13

GoogleCodeExporter commented 8 years ago
Great! no more random crashes, running: chrome, thunderbird, win 7, win xp, 
eclipse

OOT: But after a few minutes, disk activity seem increased make system 
unresponsive. Maybe because of low disk cache.

free -m
             total       used       free     shared    buffers     cached
Mem:          1998       1982         15          0          0         89
-/+ buffers/cache:       1892        106
Swap:         1953       1326        626

swapon -s
Filename                Type        Size    Used    Priority
/dev/zram0                              partition   999996  684132  100
/dev/zram1                              partition   999996  685780  100

Original comment by superbiji on 23 Jan 2011 at 7:02

GoogleCodeExporter commented 8 years ago
You are having total zram disksizes (zram0 + zram1) nearly equal to amount of 
RAM i.e. 2G. Even if compression ratio is nice 50%, it will seriously reduce 
RAM space available for filesystem cache (aka pagecache). So, you can probably 
try reducing zram sizes to say 256M each to see how it goes?

Ideally, to make balance between swapcache and pagecache,  we need support for 
pagecache compression too. Currently, there are two parallel efforts (zcache 
and kztmem). Hopefully, these efforts will merge into a single codebase and 
accepted in mainline.

Marking as fixed.

Original comment by nitingupta910@gmail.com on 23 Jan 2011 at 11:57