Closed pubyun closed 11 years ago
and we have some KVM guest on ubuntu 12.04, it works fine. read and write hit are high.
kvms: 0 2515083264 flashcache conf: ssd dev (/dev/sdc), disk dev (/dev/vg0/kvm) cache mode(WRITE_BACK) capacity(114026M), associativity(512), data block size(4K) metadata block size(4096b) skip sequential thresh(512K) total blocks(29190656), cached blocks(20572713), cache percent(70) dirty blocks(33318), dirty percent(0) nr_queued(0)
kvms: 0 2515083264 flashcache stats: reads(9206002712), writes(4831478048) read hits(7991443995), read hit percent(86) write hits(3211363145) write hit percent(66) dirty write hits(1607865974) dirty write hit percent(33) replacement(268233182), write replacement(284900996) write invalidates(369567314), read invalidates(22945861) pending enqueues(32908547), pending inval(30687086) metadata dirties(2058104090), metadata cleans(2058110448) metadata batch(3222910314) metadata ssd writes(893304088) cleanings(2058107859) fallow cleanings(58166990) no room(3060419) front merge(1569469517) back merge(322274621) disk reads(1249800943), disk writes(3225295944) ssd reads(10048630266) ssd writes(5070401573) uncached reads(738581956), uncached writes(1167402012), uncached IO requeue(142) disk read errors(0), disk write errors(0) ssd read errors(0) ssd write errors(0) uncached sequential reads(268882222), uncached sequential writes(795524924) pid_adds(0), pid_dels(0), pid_drops(0) pid_expiry(0)
dev.flashcache.sdc+kvm.do_sync = 0 dev.flashcache.sdc+kvm.stop_sync = 0 dev.flashcache.sdc+kvm.dirty_thresh_pct = 30 dev.flashcache.sdc+kvm.max_clean_ios_total = 4 dev.flashcache.sdc+kvm.max_clean_ios_set = 2 dev.flashcache.sdc+kvm.do_pid_expiry = 0 dev.flashcache.sdc+kvm.max_pids = 100 dev.flashcache.sdc+kvm.pid_expiry_secs = 60 dev.flashcache.sdc+kvm.reclaim_policy = 1 dev.flashcache.sdc+kvm.zero_stats = 0 dev.flashcache.sdc+kvm.fast_remove = 0 dev.flashcache.sdc+kvm.cache_all = 1 dev.flashcache.sdc+kvm.fallow_clean_speed = 2 dev.flashcache.sdc+kvm.fallow_delay = 900 dev.flashcache.sdc+kvm.skip_seq_thresh_kb = 512
From memory, this won't work the way you want it to. You need to create a block device (I use LVM) then use flashcache on the block device - trying to cache a filesystem like this doesn't fly.
Once you've sorted that;
a. in your grub config, add "elevator=noop" , this should double your thruput b. In your QEMU config, add this to the driver section; cache="writeback" threads="native">
You should expect to get around 75% of native SSD speed inside your KVM instance.
@garethbult, we are running openstack, so we use qcow2 instead of LVM. testing with fio, both file and raw device, shows high throughout and IOPS.
most guest OS are Windows, so we can't set "elevator=noop". it's dangerous to set cache="writeback" in QEMU config.
we have another server, same hardware config, running ubuntu 12.04, it works fine. the difference are most guest OS are Linux.
Can you email me the "Size Hist" at the end of the dmsetup table output for the case where the performance is bad ?
Nearly all your reads and writes are uncached. One reason for that might be that incoming IOs are being split into less than 4KB, "Size Hist" will tell us that. Also, about 20% of your writes are being detected as sequential and therefore uncached, this is probably OK and what you want.
From: Peng Yong notifications@github.com To: facebook/flashcache flashcache@noreply.github.com Sent: Sunday, July 28, 2013 7:34 PM Subject: [flashcache] poor performance of KVM on flashcache (#132)
i create a flashcache on md1 and ssd, load and IO wait of host are high. status shows read and write hit are low. the OS is CentOS 6.4, if run fio on the same production system with high load, it give me good performance. cat /etc/redhat-release CentOS release 6.4 (Final) uname -a Linux h172-16-0-9 2.6.32-358.14.1.el6.x86_64 #1 SMP Sat Jul 20 19:01:27 CST 2013 x86_64 x86_64 x86_64 GNU/Linux cat /proc/flashcache/flashcache_version Flashcache Version : flashcache-2.0
fc_md1: 0 3902860928 flashcache conf: ssd dev (/dev/sdc), disk dev (/dev/md1) cache mode(WRITE_BACK) capacity(114026M), associativity(512), data block size(4K) metadata block size(4096b) skip sequential thresh(512K) total blocks(29190656), cached blocks(2815563), cache percent(9) dirty blocks(619861), dirty percent(2) nr_queued(0) dmsetup status fc_md1: 0 3902860928 flashcache stats: reads(88096439), writes(272709908) read hits(2527409), read hit percent(2) write hits(28634512) write hit percent(10) dirty write hits(24087157) dirty write hit percent(8) replacement(2388), write replacement(85007) write invalidates(1916721), read invalidates(331602) pending enqueues(659098), pending inval(598604) metadata dirties(8930851), metadata cleans(8315979) metadata batch(11116725) metadata ssd writes(6130101) cleanings(8315975) fallow cleanings(347315) no room(21) front merge(6843701) back merge(451844) disk reads(85624857), disk writes(248067847) ssd reads(10843323) ssd writes(39773632) uncached reads(84999380), uncached writes(239751926), uncached IO requeue(5137) disk read errors(0), disk write errors(0) ssd read errors(0) ssd write errors(0) uncached sequential reads(419488), uncached sequential writes(63376799) pid_adds(0), pid_dels(0), pid_drops(0) pid_expiry(0) sysctl -a|grep flashcache <<< dev.flashcache.sdc+md1.io_latency_hist = 0 dev.flashcache.sdc+md1.do_sync = 0 dev.flashcache.sdc+md1.stop_sync = 0 dev.flashcache.sdc+md1.dirty_thresh_pct = 30 dev.flashcache.sdc+md1.max_clean_ios_total = 4 dev.flashcache.sdc+md1.max_clean_ios_set = 2 dev.flashcache.sdc+md1.do_pid_expiry = 0 dev.flashcache.sdc+md1.max_pids = 100 dev.flashcache.sdc+md1.pid_expiry_secs = 60 dev.flashcache.sdc+md1.reclaim_policy = 1 dev.flashcache.sdc+md1.zero_stats = 0 dev.flashcache.sdc+md1.fast_remove = 0 dev.flashcache.sdc+md1.cache_all = 1 dev.flashcache.sdc+md1.fallow_clean_speed = 2 dev.flashcache.sdc+md1.fallow_delay = 900 dev.flashcache.sdc+md1.skip_seq_thresh_kb = 512
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 0.00 22.20 18.80 88.80 75.20 8.00 0.01 0.30 0.12 0.48 sdb 153.60 68.60 24.00 73.20 349.20 313.70 13.64 0.64 6.73 4.63 45.02 sda 0.00 74.60 0.00 67.00 0.00 313.70 9.36 4.06 60.90 14.93 100.00 md1 0.00 0.00 177.60 123.00 349.20 266.20 4.09 0.00 0.00 0.00 0.00
i run fio: fio -filename=/home/kvm/test.iso -iodepth=64 -ioengine=libaio -direct=1 -rw=randread -bs=4k -size=5G -numjobs=64 -runtime=20 -group_reporting -name=test-rand-read test-rand-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 ... test-rand-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.0.13 Starting 64 processes Jobs: 64 (f=64): [rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr] [2.6% done] [151.8M/0K/0K /s] [38.9K/0 /0 iops] [eta 13m:57s] test-rand-read: (groupid=0, jobs=64): err= 0: pid=8411: Sun Jul 28 11:09:46 2013 read : io=8764.3MB, bw=421936KB/s, iops=105484 , runt= 21270msec slat (usec): min=3 , max=133025 , avg=567.80, stdev=4786.41 clat (usec): min=0 , max=4819.2K, avg=38066.10, stdev=82945.67 lat (usec): min=4 , max=4819.2K, avg=38634.18, stdev=83316.49 clat percentiles (usec): | 1.00th=[ 270], 5.00th=[ 370], 10.00th=[ 438], 20.00th=[ 548], | 30.00th=[ 684], 40.00th=[ 1320], 50.00th=[37632], 60.00th=[40192], | 70.00th=[42752], 80.00th=[46848], 90.00th=[87552], 95.00th=[121344], | 99.00th=[166912], 99.50th=[197632], 99.90th=[1286144], 99.95th=[1531904], | 99.99th=[2768896] bw (KB/s) : min= 507, max=14088, per=1.61%, avg=6808.31, stdev=4171.33 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.02% lat (usec) : 100=0.02%, 250=0.57%, 500=14.67%, 750=17.86%, 1000=5.08% lat (msec) : 2=2.91%, 4=0.22%, 10=0.20%, 20=0.16%, 50=40.89% lat (msec) : 100=10.60%, 250=6.50%, 500=0.05%, 750=0.02%, 1000=0.02% lat (msec) : 2000=0.18%, >=2000=0.03% cpu : usr=0.40%, sys=2.23%, ctx=32528, majf=0, minf=5851 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.8% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=2243647/w=0/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): READ: io=8764.3MB, aggrb=421936KB/s, minb=421936KB/s, maxb=421936KB/s, mint=21270msec, maxt=21270msec Disk stats (read/write): dm-6: ios=776769/1100, merge=0/0, ticks=7857780/789095, in_queue=9088783, util=100.00%, aggrios=776777/2930, aggrmerge=0/0, aggrticks=8264371/833165, aggrin_queue=9104612, aggrutil=100.00% dm-0: ios=776777/2930, merge=0/0, ticks=8264371/833165, in_queue=9104612, util=100.00%, aggrios=388410/3080, aggrmerge=1719/45, aggrticks=1574723/1890, aggrin_queue=1576267, aggrutil=94.36% md1: ios=7631/2568, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=3299/1339, aggrmerge=511/1192, aggrticks=575079/114131, aggrin_queue=689327, aggrutil=99.83% sdb: ios=3973/1387, merge=688/1132, ticks=264205/30142, in_queue=294624, util=58.97% sda: ios=2626/1291, merge=334/1253, ticks=885953/198120, in_queue=1084030, util=99.83% sdc: ios=769190/3593, merge=3439/90, ticks=3149447/3781, in_queue=3152535, util=94.36% fio -filename=/home/kvm/test.iso -iodepth=64 -ioengine=libaio -direct=1 -rw=randwrite -bs=4k -size=5G -numjobs=64 -runtime=20 -group_reporting -name=test-rand-write test-rand-write: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 ... test-rand-write: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.0.13 Starting 64 processes Jobs: 3 (f=3): [_w__w____w__] [1.8% done] [0K/34393K/0K /s] [0 /8598 /0 iops] [eta 20m:19s] s] test-rand-write: (groupid=0, jobs=64): err= 0: pid=8944: Sun Jul 28 11:10:55 2013 write: io=2106.9MB, bw=102891KB/s, iops=25722 , runt= 20968msec slat (usec): min=8 , max=144353 , avg=1262.01, stdev=7815.16 clat (usec): min=1 , max=9287.5K, avg=152480.13, stdev=205294.09 lat (usec): min=57 , max=9287.5K, avg=153742.87, stdev=205287.22 clat percentiles (usec): | 1.00th=[ 37], 5.00th=[ 68], 10.00th=[ 3824], 20.00th=[66048], | 30.00th=[77312], 40.00th=[81408], 50.00th=[96768], 60.00th=[119296], | 70.00th=[146432], 80.00th=[181248], 90.00th=[378880], 95.00th=[552960], | 99.00th=[675840], 99.50th=[757760], 99.90th=[1564672], 99.95th=[3555328], | 99.99th=[6520832] bw (KB/s) : min= 7, max= 4536, per=1.62%, avg=1668.32, stdev=719.30 lat (usec) : 2=0.01%, 4=0.01%, 20=0.01%, 50=2.85%, 100=3.67% lat (usec) : 250=1.06%, 500=0.66%, 750=0.24%, 1000=0.17% lat (msec) : 2=0.52%, 4=0.92%, 10=1.91%, 20=1.60%, 50=4.01% lat (msec) : 100=33.23%, 250=33.03%, 500=9.79%, 750=5.79%, 1000=0.25% lat (msec) : 2000=0.21%, >=2000=0.08% cpu : usr=0.27%, sys=1.64%, ctx=532549, majf=0, minf=1757 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=0/w=539357/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=2106.9MB, aggrb=102891KB/s, minb=102891KB/s, maxb=102891KB/s, mint=20968msec, maxt=20968msec Disk stats (read/write): dm-6: ios=621/541050, merge=0/0, ticks=34875/11477976, in_queue=11597827, util=100.00%, aggrios=621/556542, aggrmerge=0/0, aggrticks=34873/12636219, aggrin_queue=12754158, aggrutil=100.00% dm-0: ios=621/556542, merge=0/0, ticks=34873/12636219, in_queue=12754158, util=100.00%, aggrios=781/345480, aggrmerge=233/1828, aggrticks=223/869199, aggrin_queue=869039, aggrutil=89.63% md1: ios=1069/9700, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=98/3299, aggrmerge=436/6352, aggrticks=2997/198338, aggrin_queue=201479, aggrutil=93.94% sdb: ios=162/3355, merge=609/6303, ticks=1439/128383, in_queue=129639, util=50.57% sda: ios=35/3244, merge=263/6402, ticks=4556/268293, in_queue=273319, util=93.94% sdc: ios=494/681261, merge=466/3656, ticks=447/1738398, in_queue=1738078, util=89.63% and when i run fio, iostats shows:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdc 40.40 95.40 32.80 31193.60 292.80 125156.00 8.03 55.64 1.79 0.03 92.28 sdb 11.60 147.00 1.20 204.80 25.60 1247.00 12.36 12.61 61.09 2.22 45.80 sda 0.00 150.00 0.00 191.20 0.00 1198.90 12.54 22.94 95.66 4.16 79.62 md1 0.00 0.00 12.80 356.80 25.60 1251.70 6.91 0.00 0.00 0.00 0.00 — Reply to this email directly or view it on GitHub.
Mmm, you're using Windows and complaining performance is poor, the mind boggles ... ;)
If you don't set writeback, performance is likely to be far worse than you (I) might expect - your choice. If you want to use QCOW2 (I do) I'd recommend using NBD. Export your QCOW2 FS with qemu-nbd, then connect to it with nbd-client, then attach your FlashCache to the nbd device - this works fairly well, although from a management perspective it's a little tricky. I initially wrote a bunch of shell scripts to manage this for me for mass-production, sort of worked ok - but still not ideal.
Hopefully there will be a product available soon to do all this properly for KVM, ideally with network RAID10, compression, sparse storage, LFU cache etc etc .. :)
[if you look in detail at writeback, then at how the Linux page cache works, my impression / experience is that there is very little [if any] "additional" risk to using writeback]
here is data. and how can tuning the system?
system with bad performance:
fc_md1: 0 3902860928 flashcache conf: ssd dev (/dev/sdc), disk dev (/dev/md1) cache mode(WRITE_BACK) capacity(114026M), associativity(512), data block size(4K) metadata block size(4096b) skip sequential thresh(512K) total blocks(29190656), cached blocks(2953995), cache percent(10) dirty blocks(2203), dirty percent(0) nr_queued(0) Size Hist: 512:134485435 1024:13527429 1536:8480106 2048:3336358 2560:6835273 3072:12769809 3584:127742570 4096:118668580
512 134485435 31.58% 1024 13527429 3.18% 1536 8480106 1.99% 2048 3336358 0.78% 2560 6835273 1.61% 3072 12769809 3.00% 3584 127742570 30.00% 4096 118668580 27.87%
system with good performance:
kvms: 0 2515083264 flashcache conf: ssd dev (/dev/sdc), disk dev (/dev/CentOS/kvm) cache mode(WRITE_BACK) capacity(114026M), associativity(512), data block size(4K) metadata block size(4096b) skip sequential thresh(512K) total blocks(29190656), cached blocks(19771545), cache percent(67) dirty blocks(29220), dirty percent(0) nr_queued(0) Size Hist: 512:234035511 1024:75332725 1536:74596710 2048:71343806 2560:73538973 3072:71947783 3584:223519365 4096:13261865088
512 234035511 1.66% 1024 75332725 0.53% 1536 74596710 0.53% 2048 71343806 0.51% 2560 73538973 0.52% 3072 71947783 0.51% 3584 223519365 1.59% 4096 13261865088 94.15%
Flashcache will only cache IOs that are exactly 4KB.
If you look at the system with the bad performance, notice that only 28% of all IOs are 4KB. 72% of all IOs are smaller, so all of these are uncached.
On the system with good performance, on the other hand nearly all IOs are 4KB.
There could be any number of reasons why IOs coming into flashcache are broken up like this. The first thing I'd check is to see if the start of the filesystem boundary is aligned at 4KB.
From: Peng Yong notifications@github.com To: facebook/flashcache flashcache@noreply.github.com Cc: Mohan Srinivasan mohan_srinivasan@yahoo.com Sent: Monday, July 29, 2013 9:07 AM Subject: Re: [flashcache] poor performance of KVM on flashcache (#132)
here is data. and how can tuning the system? system with bad performance:
fc_md1: 0 3902860928 flashcache conf: ssd dev (/dev/sdc), disk dev (/dev/md1) cache mode(WRITE_BACK) capacity(114026M), associativity(512), data block size(4K) metadata block size(4096b) skip sequential thresh(512K) total blocks(29190656), cached blocks(2953995), cache percent(10) dirty blocks(2203), dirty percent(0) nr_queued(0) Size Hist: 512:134485435 1024:13527429 1536:8480106 2048:3336358 2560:6835273 3072:12769809 3584:127742570 4096:118668580 512 134485435 31.58% 1024 13527429 3.18% 1536 8480106 1.99% 2048 3336358 0.78% 2560 6835273 1.61% 3072 12769809 3.00% 3584 127742570 30.00% 4096 118668580 27.87% system with good performance:
kvms: 0 2515083264 flashcache conf: ssd dev (/dev/sdc), disk dev (/dev/CentOS/kvm) cache mode(WRITE_BACK) capacity(114026M), associativity(512), data block size(4K) metadata block size(4096b) skip sequential thresh(512K) total blocks(29190656), cached blocks(19771545), cache percent(67) dirty blocks(29220), dirty percent(0) nr_queued(0) Size Hist: 512:234035511 1024:75332725 1536:74596710 2048:71343806 2560:73538973 3072:71947783 3584:223519365 4096:13261865088 512 234035511 1.66% 1024 75332725 0.53% 1536 74596710 0.53% 2048 71343806 0.51% 2560 73538973 0.52% 3072 71947783 0.51% 3584 223519365 1.59% 4096 13261865088 94.15% — Reply to this email directly or view it on GitHub.
Most [99.9+%] NBD requests are multiples of 4KB, which I guess is why it works well with FlashCache ... :)
i find many documents about disk partition align, and no glue to handle it.
how can i check if my partition is aligned correctly? and how can i realigned it if it's wrong?
i use a kickstart file to partition the CentOS system:
/usr/sbin/parted -s -- /dev/$drive1 mklabel gpt
/usr/sbin/parted -s -- /dev/$drive2 mklabel gpt
/usr/sbin/parted -s -- /dev/$drive1 unit MB mkpart primary 1 5
/usr/sbin/parted -s -- /dev/$drive2 unit MB mkpart primary 1 5
/usr/sbin/parted -s -- /dev/$drive1 set 1 bios_grub on
/usr/sbin/parted -s -- /dev/$drive2 set 1 bios_grub on
/usr/sbin/parted -s -- /dev/$drive1 unit MB mkpart primary 5 2000
/usr/sbin/parted -s -- /dev/$drive2 unit MB mkpart primary 5 2000
/usr/sbin/parted -s -- /dev/$drive1 set 2 boot on
/usr/sbin/parted -s -- /dev/$drive2 set 2 boot on
/usr/sbin/parted -s -- /dev/$drive1 unit MB mkpart primary 2000 -0
/usr/sbin/parted -s -- /dev/$drive2 unit MB mkpart primary 2000 -0
/usr/sbin/parted -s -- /dev/$drive1 set 3 raid on
/usr/sbin/parted -s -- /dev/$drive2 set 3 raid on
GNU Parted 2.1 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) unit s (parted) p Model: ATA WDC WD2003FYYS-0 (scsi) Disk /dev/sda: 3907029168s Sector size (logical/physical): 512B/4096B Partition Table: gpt
Number Start End Size File system Name Flags 1 2048s 10239s 8192s primary 2 10240s 3905535s 3895296s ext4 primary boot 3 3905536s 3907028991s 3903123456s primary raid
is this disk aligned to 4kb boundary?
then i have a md on /dev/sda3 and /dev/dev/sdb3:
Personalities : [raid1] md0 : active raid1 sda2[0] sdb2[1] 1947584 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk
md1 : active raid1 sda3[0] sdb3[1] 1951430464 blocks super 1.1 [2/2] [UU] bitmap: 4/15 pages [16KB], 65536KB chunk
i have a flashcache device on md1:
vg0-test: 0 209715200 linear 253:0 2525366272 vg0-home: 0 62914560 linear 253:0 29362176 vg0-home: 62914560 146800640 linear 253:0 2378565632 fc_md1: 0 3902860928 flashcache conf: ssd dev (/dev/sdc), disk dev (/dev/md1) cache mode(WRITE_BACK) capacity(114026M), associativity(512), data block size(4K) metadata block size(4096b) skip sequential thresh(512K) total blocks(29190656), cached blocks(2953082), cache percent(10) dirty blocks(1378), dirty percent(0) nr_queued(0) Size Hist: 512:139363574 1024:14066635 1536:8798657 2048:3482262 2560:7094672 3072:13270678 3584:132310668 4096:122190926
then i create LVM group and LVM logical Volum:
--- Physical volume --- PV Name /dev/mapper/fc_md1 VG Name vg0 PV Size 1.82 TiB / not usable 29.81 MiB Allocatable yes PE Size 32.00 MiB Total PE 59552 Free PE 17818 Allocated PE 41734 PV UUID ftvUj9-1SeY-69so-jqyI-hoAN-A0hE-nIGYet
--- Logical volume --- LV Path /dev/vg0/swap LV Name swap VG Name vg0 LV UUID fmuFtp-NiV5-KNNC-G7i1-L2hf-gUAL-nQIgri LV Write Access read/write LV Creation host, time install.bitcomm.cn, 2013-06-21 19:36:00 +0800 LV Status available
LV Size 2.00 GiB Current LE 64 Segments 1 Allocation inherit Read ahead sectors auto
currently set to 256 Block device 253:1
--- Logical volume --- LV Path /dev/vg0/var LV Name var VG Name vg0 LV UUID r3pX05-6OVa-VNf1-WdlD-TqNY-Pwwk-IOTfOo LV Write Access read/write LV Creation host, time install.bitcomm.cn, 2013-06-21 19:36:00 +0800 LV Status available
LV Size 5.00 GiB Current LE 160 Segments 1 Allocation inherit Read ahead sectors auto
currently set to 256 Block device 253:2
--- Logical volume --- LV Path /dev/vg0/tmp LV Name tmp VG Name vg0 LV UUID Qij9ja-04Lg-VUxI-rgZ2-Mdn1-OLlW-oa3URb LV Write Access read/write LV Creation host, time install.bitcomm.cn, 2013-06-21 19:36:04 +0800 LV Status available
LV Size 2.00 GiB Current LE 64 Segments 1 Allocation inherit Read ahead sectors auto
currently set to 256 Block device 253:3
--- Logical volume --- LV Path /dev/vg0/usr LV Name usr VG Name vg0 LV UUID FdEIlF-bm8j-rPQW-hPOI-wY6r-me1Y-SfcUYa LV Write Access read/write LV Creation host, time install.bitcomm.cn, 2013-06-21 19:36:07 +0800 LV Status available
LV Size 5.00 GiB Current LE 160 Segments 1 Allocation inherit Read ahead sectors auto
currently set to 256 Block device 253:4
--- Logical volume --- LV Path /dev/vg0/home LV Name home VG Name vg0 LV UUID 7VT1rV-wl8X-Mg7v-2Jib-hWDI-2pW8-vqX0Kn LV Write Access read/write LV Creation host, time install.bitcomm.cn, 2013-06-21 19:36:11 +0800 LV Status available
LV Size 100.00 GiB Current LE 3200 Segments 2 Allocation inherit Read ahead sectors auto
currently set to 256 Block device 253:5
--- Logical volume --- LV Path /dev/vg0/nova LV Name nova VG Name vg0 LV UUID NBwuAI-xyjN-liCI-cNe0-llBI-pUJc-Ou3Ptr LV Write Access read/write LV Creation host, time h172-16-0-9, 2013-07-03 16:20:05 +0800 LV Status available
LV Size 1.06 TiB Current LE 34886 Segments 1 Allocation inherit Read ahead sectors auto
and i put all KVM qcow2 files under device /dev/vg0/nova
i run align-check command of parted on CentOS, it give me no message:
if i run it on Ubuntu, it give me "1 aligned" message:
1 aligned
the partition type of ubuntu system is msdos instead of gpt.
GNU Parted 2.3 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) unit s (parted) p Model: ATA WDC WD2003FYYS-0 (scsi) Disk /dev/sda: 3907029168s Sector size (logical/physical): 512B/512B Partition Table: msdos
Number Start End Size Type File system Flags 1 2048s 499711s 497664s primary raid 2 499712s 3907028991s 3906529280s primary raid
If you are running the MSDOS filesystem on this, it is likely that the filesystem is issuing most of its IOs as < 4KB. If that is the case, flashcache cannot really do anything about it :(
From: Peng Yong notifications@github.com To: facebook/flashcache flashcache@noreply.github.com Cc: Mohan Srinivasan mohan_srinivasan@yahoo.com Sent: Monday, July 29, 2013 5:32 PM Subject: Re: [flashcache] poor performance of KVM on flashcache (#132)
i run align-check command of parted on CentOS, it give me no message:
if i run it on Ubuntu, it give me "1 aligned" message:
1 aligned the partition type of ubuntu system is msdos instead of gpt.
GNU Parted 2.3 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) unit s (parted) p Model: ATA WDC WD2003FYYS-0 (scsi) Disk /dev/sda: 3907029168s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 2048s 499711s 497664s primary raid 2 499712s 3907028991s 3906529280s primary raid — Reply to this email directly or view it on GitHub.
@mohans, thanks.
i create a qcow2 image, and part it by linux parted first, then install windows on it. it works and cache hit is OK now
what's command for parted qcow2 image 4k align?
i create a flashcache on md1 and ssd, load and IO wait of host are high.
status shows read and write hit are low.
the OS is CentOS 6.4,
if run fio on the same production system with high load, it give me good performance.
cat /etc/redhat-release
CentOS release 6.4 (Final)
uname -a
Linux h172-16-0-9 2.6.32-358.14.1.el6.x86_64 #1 SMP Sat Jul 20 19:01:27 CST 2013 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/flashcache/flashcache_version
Flashcache Version : flashcache-2.0
dmsetup table
fc_md1: 0 3902860928 flashcache conf: ssd dev (/dev/sdc), disk dev (/dev/md1) cache mode(WRITE_BACK) capacity(114026M), associativity(512), data block size(4K) metadata block size(4096b) skip sequential thresh(512K) total blocks(29190656), cached blocks(2815563), cache percent(9) dirty blocks(619861), dirty percent(2) nr_queued(0)
dmsetup status
fc_md1: 0 3902860928 flashcache stats: reads(88096439), writes(272709908) read hits(2527409), read hit percent(2) write hits(28634512) write hit percent(10) dirty write hits(24087157) dirty write hit percent(8) replacement(2388), write replacement(85007) write invalidates(1916721), read invalidates(331602) pending enqueues(659098), pending inval(598604) metadata dirties(8930851), metadata cleans(8315979) metadata batch(11116725) metadata ssd writes(6130101) cleanings(8315975) fallow cleanings(347315) no room(21) front merge(6843701) back merge(451844) disk reads(85624857), disk writes(248067847) ssd reads(10843323) ssd writes(39773632) uncached reads(84999380), uncached writes(239751926), uncached IO requeue(5137) disk read errors(0), disk write errors(0) ssd read errors(0) ssd write errors(0) uncached sequential reads(419488), uncached sequential writes(63376799) pid_adds(0), pid_dels(0), pid_drops(0) pid_expiry(0)
sysctl -a|grep flashcache <<<
dev.flashcache.sdc+md1.io_latency_hist = 0 dev.flashcache.sdc+md1.do_sync = 0 dev.flashcache.sdc+md1.stop_sync = 0 dev.flashcache.sdc+md1.dirty_thresh_pct = 30 dev.flashcache.sdc+md1.max_clean_ios_total = 4 dev.flashcache.sdc+md1.max_clean_ios_set = 2 dev.flashcache.sdc+md1.do_pid_expiry = 0 dev.flashcache.sdc+md1.max_pids = 100 dev.flashcache.sdc+md1.pid_expiry_secs = 60 dev.flashcache.sdc+md1.reclaim_policy = 1 dev.flashcache.sdc+md1.zero_stats = 0 dev.flashcache.sdc+md1.fast_remove = 0 dev.flashcache.sdc+md1.cache_all = 1 dev.flashcache.sdc+md1.fallow_clean_speed = 2 dev.flashcache.sdc+md1.fallow_delay = 900 dev.flashcache.sdc+md1.skip_seq_thresh_kb = 512
iostat -xkN 5
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdc 0.00 0.00 22.20 18.80 88.80 75.20 8.00 0.01 0.30 0.12 0.48 sdb 153.60 68.60 24.00 73.20 349.20 313.70 13.64 0.64 6.73 4.63 45.02 sda 0.00 74.60 0.00 67.00 0.00 313.70 9.36 4.06 60.90 14.93 100.00 md1 0.00 0.00 177.60 123.00 349.20 266.20 4.09 0.00 0.00 0.00 0.00
i run fio:
fio -filename=/home/kvm/test.iso -iodepth=64 -ioengine=libaio -direct=1 -rw=randread -bs=4k -size=5G -numjobs=64 -runtime=20 -group_reporting -name=test-rand-read
test-rand-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 ... test-rand-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.0.13 Starting 64 processes Jobs: 64 (f=64): [rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr] [2.6% done] [151.8M/0K/0K /s] [38.9K/0 /0 iops] [eta 13m:57s] test-rand-read: (groupid=0, jobs=64): err= 0: pid=8411: Sun Jul 28 11:09:46 2013 read : io=8764.3MB, bw=421936KB/s, iops=105484 , runt= 21270msec slat (usec): min=3 , max=133025 , avg=567.80, stdev=4786.41 clat (usec): min=0 , max=4819.2K, avg=38066.10, stdev=82945.67 lat (usec): min=4 , max=4819.2K, avg=38634.18, stdev=83316.49 clat percentiles (usec): | 1.00th=[ 270], 5.00th=[ 370], 10.00th=[ 438], 20.00th=[ 548], | 30.00th=[ 684], 40.00th=[ 1320], 50.00th=[37632], 60.00th=[40192], | 70.00th=[42752], 80.00th=[46848], 90.00th=[87552], 95.00th=[121344], | 99.00th=[166912], 99.50th=[197632], 99.90th=[1286144], 99.95th=[1531904], | 99.99th=[2768896] bw (KB/s) : min= 507, max=14088, per=1.61%, avg=6808.31, stdev=4171.33 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.02% lat (usec) : 100=0.02%, 250=0.57%, 500=14.67%, 750=17.86%, 1000=5.08% lat (msec) : 2=2.91%, 4=0.22%, 10=0.20%, 20=0.16%, 50=40.89% lat (msec) : 100=10.60%, 250=6.50%, 500=0.05%, 750=0.02%, 1000=0.02% lat (msec) : 2000=0.18%, >=2000=0.03% cpu : usr=0.40%, sys=2.23%, ctx=32528, majf=0, minf=5851 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.8% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=2243647/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs): READ: io=8764.3MB, aggrb=421936KB/s, minb=421936KB/s, maxb=421936KB/s, mint=21270msec, maxt=21270msec
Disk stats (read/write): dm-6: ios=776769/1100, merge=0/0, ticks=7857780/789095, in_queue=9088783, util=100.00%, aggrios=776777/2930, aggrmerge=0/0, aggrticks=8264371/833165, aggrin_queue=9104612, aggrutil=100.00% dm-0: ios=776777/2930, merge=0/0, ticks=8264371/833165, in_queue=9104612, util=100.00%, aggrios=388410/3080, aggrmerge=1719/45, aggrticks=1574723/1890, aggrin_queue=1576267, aggrutil=94.36% md1: ios=7631/2568, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=3299/1339, aggrmerge=511/1192, aggrticks=575079/114131, aggrin_queue=689327, aggrutil=99.83% sdb: ios=3973/1387, merge=688/1132, ticks=264205/30142, in_queue=294624, util=58.97% sda: ios=2626/1291, merge=334/1253, ticks=885953/198120, in_queue=1084030, util=99.83% sdc: ios=769190/3593, merge=3439/90, ticks=3149447/3781, in_queue=3152535, util=94.36%
fio -filename=/home/kvm/test.iso -iodepth=64 -ioengine=libaio -direct=1 -rw=randwrite -bs=4k -size=5G -numjobs=64 -runtime=20 -group_reporting -name=test-rand-write
test-rand-write: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 ... test-rand-write: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64 fio-2.0.13 Starting 64 processes Jobs: 3 (f=3): [_w__w____w__] [1.8% done] [0K/34393K/0K /s] [0 /8598 /0 iops] [eta 20m:19s] s] test-rand-write: (groupid=0, jobs=64): err= 0: pid=8944: Sun Jul 28 11:10:55 2013 write: io=2106.9MB, bw=102891KB/s, iops=25722 , runt= 20968msec slat (usec): min=8 , max=144353 , avg=1262.01, stdev=7815.16 clat (usec): min=1 , max=9287.5K, avg=152480.13, stdev=205294.09 lat (usec): min=57 , max=9287.5K, avg=153742.87, stdev=205287.22 clat percentiles (usec): | 1.00th=[ 37], 5.00th=[ 68], 10.00th=[ 3824], 20.00th=[66048], | 30.00th=[77312], 40.00th=[81408], 50.00th=[96768], 60.00th=[119296], | 70.00th=[146432], 80.00th=[181248], 90.00th=[378880], 95.00th=[552960], | 99.00th=[675840], 99.50th=[757760], 99.90th=[1564672], 99.95th=[3555328], | 99.99th=[6520832] bw (KB/s) : min= 7, max= 4536, per=1.62%, avg=1668.32, stdev=719.30 lat (usec) : 2=0.01%, 4=0.01%, 20=0.01%, 50=2.85%, 100=3.67% lat (usec) : 250=1.06%, 500=0.66%, 750=0.24%, 1000=0.17% lat (msec) : 2=0.52%, 4=0.92%, 10=1.91%, 20=1.60%, 50=4.01% lat (msec) : 100=33.23%, 250=33.03%, 500=9.79%, 750=5.79%, 1000=0.25% lat (msec) : 2000=0.21%, >=2000=0.08% cpu : usr=0.27%, sys=1.64%, ctx=532549, majf=0, minf=1757 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued : total=r=0/w=539357/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs): WRITE: io=2106.9MB, aggrb=102891KB/s, minb=102891KB/s, maxb=102891KB/s, mint=20968msec, maxt=20968msec
Disk stats (read/write): dm-6: ios=621/541050, merge=0/0, ticks=34875/11477976, in_queue=11597827, util=100.00%, aggrios=621/556542, aggrmerge=0/0, aggrticks=34873/12636219, aggrin_queue=12754158, aggrutil=100.00% dm-0: ios=621/556542, merge=0/0, ticks=34873/12636219, in_queue=12754158, util=100.00%, aggrios=781/345480, aggrmerge=233/1828, aggrticks=223/869199, aggrin_queue=869039, aggrutil=89.63% md1: ios=1069/9700, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=98/3299, aggrmerge=436/6352, aggrticks=2997/198338, aggrin_queue=201479, aggrutil=93.94% sdb: ios=162/3355, merge=609/6303, ticks=1439/128383, in_queue=129639, util=50.57% sda: ios=35/3244, merge=263/6402, ticks=4556/268293, in_queue=273319, util=93.94% sdc: ios=494/681261, merge=466/3656, ticks=447/1738398, in_queue=1738078, util=89.63%
and when i run fio, iostats shows:
iostat -xkN 5
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdc 40.40 95.40 32.80 31193.60 292.80 125156.00 8.03 55.64 1.79 0.03 92.28 sdb 11.60 147.00 1.20 204.80 25.60 1247.00 12.36 12.61 61.09 2.22 45.80 sda 0.00 150.00 0.00 191.20 0.00 1198.90 12.54 22.94 95.66 4.16 79.62 md1 0.00 0.00 12.80 356.80 25.60 1251.70 6.91 0.00 0.00 0.00 0.00