FS errors/corruption in Qubes R4.1 on low gnttab_max_frames / gnttab_max_maptrack_frames

arno01 commented 4 years ago

Qubes OS version R4.1 xen 4.13 Linux 4.19.94-1

Affected component(s) or functionality

domU (VMs) FS (probably LVM too?)

Brief summaryp When running many VMs, some VMs randomly might get I/O error on their disks (backed by the dom0's LV) when the xen values: gnttab_max_frames / gnttab_max_maptrack_frames are low.

Observations

dom0's xen:grant_table: maptrack limit reached, can't map all guest pages messages are coming shortly (or at the same time) as the I/O errors in domU.
setting gnttab_max_frames=2048 does not alleviate the issue as I have been monitoring the values, they were not higher than 300, e.g.:

[arno@dom0 ~]$ for i in $(xl list | tail -n+2 | awk '{print $2}'); do xen-diag gnttab_query_size $i; done; echo $(( $(for i in $(xl list | tail -n+2 | awk '{print $2}'); do xen-diag gnttab_query_size $i; done | awk '{print $2}' | cut -d\= -f2 | sed 's/,//g' | xargs | tr ' ' '+') ));
domid=0: nr_frames=1, max_nr_frames=1024
domid=1: nr_frames=18, max_nr_frames=1024
domid=2: nr_frames=2, max_nr_frames=1024
domid=3: nr_frames=19, max_nr_frames=1024
domid=4: nr_frames=53, max_nr_frames=1024
domid=6: nr_frames=58, max_nr_frames=1024
151

Logs

sys-net domU

Feb 20 14:24:12 sys-net kernel: print_req_error: I/O error, dev xvda, sector 9500856
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 9500856
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 9394232
Feb 20 14:24:13 sys-net kernel: EXT4-fs warning (device xvda3): ext4_end_bio:317: I/O error 10 writing to inode 130442 (offset 0 size 0 starting block 1174280)
Feb 20 14:24:13 sys-net kernel: Buffer I/O error on device xvda3, logical block 1122311
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 9500856
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 514168
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 9500856
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 514248
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 9070560
Feb 20 14:24:13 sys-net kernel: Aborting journal on device xvda3-8.
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 9066496
Feb 20 14:24:13 sys-net kernel: print_req_error: I/O error, dev xvda, sector 9066496
Feb 20 14:24:13 sys-net kernel: Buffer I/O error on dev xvda3, logical block 1081344, lost sync page write
Feb 20 14:24:13 sys-net kernel: JBD2: Error -5 detected when updating journal superblock for xvda3-8.
Feb 20 14:24:13 sys-net kernel: Buffer I/O error on dev xvda3, logical block 0, lost sync page write
Feb 20 14:24:13 sys-net kernel: EXT4-fs (xvda3): I/O error while writing superblock
Feb 20 14:24:13 sys-net kernel: EXT4-fs error (device xvda3): ext4_journal_check_start:61: Detected aborted journal
Feb 20 14:24:13 sys-net kernel: EXT4-fs (xvda3): Remounting filesystem read-only
Feb 20 14:24:13 sys-net kernel: Buffer I/O error on dev xvda3, logical block 0, lost sync page write
Feb 20 14:24:13 sys-net kernel: EXT4-fs (xvda3): I/O error while writing superblock
Feb 20 14:24:12 sys-net qubes.UpdatesProxy+-fedora-30-custom[1393]: /usr/lib/qubes/qubes-rpc-multiplexer: line 48: /etc/qubes-rpc/qubes.UpdatesProxy: Input/output error

dom0

1 second before sys-net got I/O errors.

Feb 20 14:24:10 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages
Feb 20 14:24:12 dom0 kernel: xen:grant_table: maptrack limit reached, can't map all guest pages

Related, non-duplicate issues

https://github.com/QubesOS/qubes-issues/issues/5519#issuecomment-586800991

I think I haven't observed the same issue with Qubes R4.0 (xen 4.8).

JarrahG commented 4 years ago

Have you tried bumping max_maptrack_frames along with max_grant_frames? Both the documentation and your output seem to indicate that they should be used together.

https://xenbits.xen.org/docs/unstable/man/xl.conf.5.html

0spinboson commented 4 years ago

(the default value seems to be 1024 but I don't think there's a way to query? In any case, 'xl set-parameters gnttab_max_maptrack_frames value' should allow you to change it on the fly; no idea by how much it should be increased though.)

btw, why do you think this is related to low ram? I ran into it once or twice myself, and it could be it happened while qmemman wasn't doing its job, but I never made a connection between the two.

arno01 commented 4 years ago

Yeah, that looks like a default value acc. to the Xen 4.13 code

static unsigned int __read_mostly opt_max_maptrack_frames = 1024;

I will try gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 and let you know.

arno01 commented 4 years ago

btw, why do you think this is related to low ram? I ran into it once or twice myself, and it could be it happened while qmemman wasn't doing its job, but I never made a connection between the two.

I only hit this issue when I ran multiple Firefox tabs and then opened up additional VMs, so it went really low on RAM causing the FS issues, e.g.:

[root@dom0 console]# pwd
/var/log/xen/console
[root@dom0 console]# grep -a 'fs error' *.log
guest-personal.log:[2020-02-06 13:10:31] [12765.241788] EXT4-fs error (device xvdb): ext4_journal_check_start:61: Detected aborted journal
guest-personal.log:[2020-02-06 13:10:40] [12773.566105] EXT4-fs error (device xvdb): __ext4_get_inode_loc:4707: inode #139220: block 524829: comm Cache2 I/O: unable to read itable block
guest-personal.log:[2020-02-06 13:10:40] [12773.620072] EXT4-fs error (device xvdb): __ext4_get_inode_loc:4707: inode #135454: block 524593: comm Cache2 I/O: unable to read itable block
guest-personal.log:[2020-02-06 13:10:40] [12773.685436] EXT4-fs error (device xvdb): __ext4_get_inode_loc:4707: inode #149187: block 525452: comm Cache2 I/O: unable to read itable block
guest-personal.log:[2020-02-10 14:23:44] [17478.602396] EXT4-fs error (device xvda3) in ext4_do_update_inode:5371: Journal has aborted
guest-personal.log:[2020-02-10 14:23:44] [17478.612226] EXT4-fs error (device xvda3): ext4_journal_check_start:61: Detected aborted journal
guest-personal.log:[2020-02-10 14:23:44] [17478.615765] EXT4-fs error (device xvda3): ext4_journal_check_start:61: Detected aborted journal
guest-personal.log:[2020-02-10 14:23:44] [17478.616035] EXT4-fs error (device xvda3): ext4_journal_check_start:61: Detected aborted journal
guest-personal.log:[2020-02-10 14:23:44] [17478.617295] EXT4-fs error (device xvda3): ext4_journal_check_start:61: Detected aborted journal
guest-personal.log:[2020-02-10 14:23:44] [17478.617297] EXT4-fs error (device xvda3) in ext4_da_write_end:3199: IO failure
guest-personal.log:[2020-02-10 14:23:44] [17478.622619] EXT4-fs error (device xvda3): ext4_journal_check_start:61: Detected aborted journal
guest-sys-net.log:[2020-02-06 13:10:58] [14517.226106] EXT4-fs error (device xvda3): ext4_journal_check_start:61: Detected aborted journal
guest-sys-net.log:[2020-02-06 13:10:58] [14517.392314] EXT4-fs error (device xvda3): __ext4_get_inode_loc:4707: inode #522810: block 2097281: comm systemd-coredum: unable to read itable block
guest-sys-net.log:[2020-02-06 13:10:58] [14517.521361] EXT4-fs error (device xvda3): __ext4_get_inode_loc:4707: inode #522810: block 2097281: comm systemd-coredum: unable to read itable block
guest-sys-net.log:[2020-02-06 13:10:58] [14517.542491] EXT4-fs error (device xvdb): ext4_journal_check_start:61: Detected aborted journal
guest-sys-net.log:[2020-02-20 14:24:12] [16722.727569] EXT4-fs error (device xvda3): ext4_journal_check_start:61: Detected aborted journal
guest-sys-net.log:[2020-02-20 14:24:12] [16723.474371] EXT4-fs error (device xvdb): ext4_journal_check_start:61: Detected aborted journal
guest-sys-net.log:[2020-02-21 09:30:51] [  364.637677] EXT4-fs error (device xvdb): ext4_find_entry:1454: inode #8193: comm systemd-coredum: reading directory lblock 0
guest-work.log:[2020-01-28 03:48:22] [35586.127725] EXT4-fs error (device xvdb): ext4_journal_check_start:61: Detected aborted journal
guest-work.log:[2020-02-06 13:10:31] [14456.395061] EXT4-fs error (device xvdb): ext4_journal_check_start:61: Detected aborted journal
guest-work.log:[2020-02-21 09:31:11] [  313.948032] EXT4-fs error (device xvdb): ext4_journal_check_start:61: Detected aborted journal

arno01 commented 4 years ago

I can confirm, running a lot of VM's with the following values is NOT causing any I/O nor FS errors I previously observed.

Xen arguments I am using form now on: gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096

Feeling a great relief now! Sigh, I went through so much pain with the Qubes 4.1 lately. It was worth it!

Here are some logs:

[arno@dom0 ~]$ date; for i in $(xl list | tail -n+2 | awk '{print $2}'); do xen-diag gnttab_query_size $i; done; echo $(( $(for i in $(xl list | tail -n+2 | awk '{print $2}'); do xen-diag gnttab_query_size $i; done | awk '{print $2}' | cut -d\= -f2 | sed 's/,//g' | xargs | tr ' ' '+') )); xl info free_memory; xl info xen_commandline; free -mh
Fri 21 Feb 2020 01:49:48 PM CET
domid=0: nr_frames=1, max_nr_frames=2048
domid=1: nr_frames=18, max_nr_frames=2048
domid=2: nr_frames=2, max_nr_frames=2048
domid=3: nr_frames=19, max_nr_frames=2048
domid=4: nr_frames=68, max_nr_frames=2048
domid=5: nr_frames=31, max_nr_frames=2048
domid=6: nr_frames=20, max_nr_frames=2048
domid=7: nr_frames=46, max_nr_frames=2048
domid=8: nr_frames=38, max_nr_frames=2048
domid=9: nr_frames=22, max_nr_frames=2048
domid=10: nr_frames=23, max_nr_frames=2048
domid=11: nr_frames=29, max_nr_frames=2048
domid=12: nr_frames=23, max_nr_frames=2048

340  <-- total nr_frames used

322 <-- free xen memory << qmemman did a great job here :-)

placeholder console=none dom0_mem=min:1024M dom0_mem=max:4096M iommu=no-igfx ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096 no-real-mode edd=off

              total        used        free      shared  buff/cache   available
Mem:          2.5Gi       1.2Gi       779Mi        12Mi       523Mi       1.2Gi
Swap:         8.0Gi          0B       8.0Gi

2751+1067+2632+474+488+2526+1282+502+384+997+526+1591 = 15220 MiB (my laptop has 16GiB RAM) <-- qmemman did a really great job here!

brendanhoar commented 4 years ago

Great news!

I would, however, not let the PV storage drivers off the hook in this case.

Assuming, as I have read, that current Xen/Linux PV storage uses substantially more grant tables to support multiple queues, the PV drivers should presumably have some sort of back off algorithm/methodology if and when access to these objects becomes constrained so as to not cause storage corruption.

Data is kind of important.

B

0spinboson commented 4 years ago

This isn't caused by low ram (only). I have a system with 48GB ram total, and a VM with 4.5GB constantly assigned (which only runs chromium-with-7-tabs-open, thunderbird, and a just installed google file stream installation) just ran into this issue. Increasing the maptrack value while the system was running seemed to do nothing (so I wonder if xl set-parameters gnttab_max_maptrack_frames actually does anything). I haven't tried to up the boot-time parameter yet for more or less the same reason: the apparent lack of accurate/up to date documentation.

guest-NVV.log

Am seeing similar issues in a second VM though, so I'm guessing it does have something to do with this.

arno01 commented 4 years ago

@0spinboson I can see in your guest logs there are 24 lines of xen:grant_table: maptrack limit reached, can't map all guest pages message. That is exactly the message which I was observing when I had a low gnttab_max_maptrack_frames.

Are you sure you have set it correctly? I remember the first time I was testing it, I made a mistake and forgot to add _max part :-) If you could drop here the output from xl info xen_commandline command here?

And I am not sure about the xl set-parameters as I haven't really tried it.

0spinboson commented 4 years ago

I hadn't set a custom value up to now, so I assume (but have no way to verify, as there is no way to query) it was set to 1024 before. I have now set it to 3072. That said, changing the value using xl set-parameter (even to a 'stupidly high' value like 100k) did not allow me to restart the VM and have the VM work again. Had to reboot for that.

qubesos-bot commented 4 years ago

Automated announcement from builder-github

The package python3-xen-4.13.0-1.fc31 has been pushed to the r4.1 testing repository for dom0. To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

qubesos-bot commented 4 years ago

Automated announcement from builder-github

The package xen_4.13.0-1 has been pushed to the r4.1 testing repository for the Debian template. To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing stretch-testing (or appropriate equivalent for your template version), then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

0spinboson commented 4 years ago

baddish news: with max_maptrack_frames set to 3k, max_frames to 2k, and after 4.5 days uptime with lots of document open/close activity in 1 VM specifically, I just got both I/O errors in multiple VMs, plus a dom0 Xorg crash (had to log back in). After this, I could not query the query size using 'xen-diag gnttab_query_size ID' -- this resulted in an xc error of some kind.

Also, qvm-start-gui --all didn't re-display running windows, nor did graceful qvm-shutdown work, nor (in dom0) could I get the window manager to use any display setting other than display mirroring (of my two 4k monitors). I have now upped gnttab_max_frames to 3k, max_maptrack_frames to 4k, though I suspect the latter is still too low, so I'll probably have to increase it further.

(the xorg crash may be partly due to AMD-Vi io_page_faults in my dmesg log related to my vega56 -- not sure. the errors didn't seem harmful, but it may have a stacking effect?)

andrewdavidwong commented 4 years ago

baddish news: with max_maptrack_frames set to 3k, max_frames to 2k, and after 4.5 days uptime with lots of document open/close activity in 1 VM specifically, I just got both I/O errors in multiple VMs, plus a dom0 Xorg crash (had to log back in). After this, I could not query the query size using 'xen-diag gnttab_query_size ID' -- this resulted in an xc error of some kind.

Also, qvm-start-gui --all didn't re-display running windows, nor did graceful qvm-shutdown work, nor (in dom0) could I get the window manager to use any display setting other than display mirroring (of my two 4k monitors). I have now upped gnttab_max_frames to 3k, max_maptrack_frames to 4k, though I suspect the latter is still too low, so I'll probably have to increase it further.

(the xorg crash may be partly due to AMD-Vi io_page_faults in my dmesg log related to my vega56 -- not sure. the errors didn't seem harmful, but it may have a stacking effect?)

Are you experiencing these problems with the new packages that are currently in testing (see above)? Are you suggesting that they do not actually fix the problem and that this issue should be reopened?

0spinboson commented 4 years ago

yeah, the values Marek picked are probably still too low to constitute a long-term fix, even if my own setup (with 2 4k monitors) is probably a bit unusual currently. Not sure whether simply increasing it further makes for a good solution either, as I don't know or understand if the maptrack frames are assigned once during a boot session (and thus will inevitably run out, as uptime increases) or reusable, but since I wasn't doing all that much in absolute terms at any single point in time, I'd say this should probably not be closed yet. I can try quadrupling the maptrack frames value, maybe that'll do it, but I wish I better understood the underlying stuff.

marmarek commented 4 years ago

All of those things (maptrack frames, gnttab frames) should be reusable. If it would be possible to run out of it by simply having system up long enough, that would be pretty bad bug.

What we need I think is not just wait for the issue to happen, but actively observe resources utilization. For gnttab frames, see https://github.com/QubesOS/qubes-issues/issues/5674#issuecomment-589643563. I'm not sure how to query maptrack frames utilization.

0spinboson commented 4 years ago

I'm not sure how to query maptrack frames utilization.

yeah, as far as I can tell you can't query it. I've never yet seen the gnttab_query_size exceed ~200, although as I've said, I was unable to query the VM I had been working in after X etc. crashed, so it may have been a bit higher than that.

If it would be possible to run out of it by simply having system up long enough, that would be pretty bad bug.

I'd hope so, except none of what I was doing was all that resource intensive. It mainly just involved opening and closing lots of pdf viewers, excel sheets, some pdf printing, etc.

marmarek commented 4 years ago

I don't exclude possibility of a bug in our GUI code that would leak grant table entries, but your numbers don't suggest it yet.

brendanhoar commented 4 years ago

What we need I think is not just wait for the issue to happen, but actively observe resources utilization. For gnttab frames, see #5674 (comment). I'm not sure how to query maptrack frames utilization.

@marmarek -

I'm not running 4.1 yet but I ran across this ( https://www.suse.com/support/kb/doc/?id=000018854 ) which indicates you can use

xl debug-keys g

...which should push both grant-table and maptrack frame info to the hypervisor log/dmesg under later xen versions.

The xl dmesg ringbuffer in 4.8 is tiny (possibly also in 4.12) and too small to hold everything it sends out, but I could get the full debug data on the 4.8 grant tables info from here:

tail -f /var/log/xen/console/hypervisor.log|more

before executing the previous command.

Also, obviously you don't want to randomly send other debug key characters to the xen debugger. Substitute "h" for "g" will get you the list of commands, some of which do things like reboot...

qubesos-bot commented 4 years ago

Automated announcement from builder-github

The package xen_4.13.0-2+deb9u1 has been pushed to the r4.1 stable repository for the Debian template. To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot commented 4 years ago

Automated announcement from builder-github

The package python3-xen-4.13.0-2.fc31 has been pushed to the r4.1 stable repository for dom0. To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

brendanhoar commented 4 years ago

I'm not running 4.1 yet but I ran across this ( https://www.suse.com/support/kb/doc/?id=000018854 ) which indicates you can use
xl debug-keys g

Ok ... the ring buffer is 16k in both 4.8 and 4.13, which is not even enough room to see the full output of the g debug command with the xl dmesg when just a couple domU VMs are running under 4.13 (which gives more info than 4.8...and 4.8 can easily fill the buffer with just a few VMs).

HOWEVER one can add conring_size=256k to the Xen command line (e.g.

cat >> /etc/default/grub
GRUB_CMDLINE_XEN_DEFAULT="$GRUB_CMDLINE_XEN_DEFAULT conring_size=256k"
ctrl-d

) to make the ring buffer 16 times larger, if you don’t have a serial console configured for Xen and you are monitoring grant tables and don’t need to see the state during/after a crash.

This leads to:

(XEN) System RAM: 32651MB (33434868kB)
(XEN) Domain heap initialised
(XEN) Allocated console ring of 256 KiB.

And then...

xl debug-keys g && xl dmesg|more

...becomes much more useful for tracking grant tables / maptrack info (esp. under 4.13).

iamahuman commented 3 years ago

Setting the grant frame limit to an arbitrary value might not be sufficient. The user can always open more (i.e. unbounded number of) overlapping windows, each of which owns a gntdev backed pixmap.

It may be easier to solve if the agent knows the current stacking of windows.

DemiMarie commented 3 years ago

Does this actually cause filesystem corruption? I/O errors I can understand, but this should not cause corrupted data.

iamahuman commented 3 years ago

@DemiMarie From what I've observed, to the extent which a forced shutdown could cause and nothing more.

fsck would usually report only a few trivial inconsistencies, and data before the last sync() would be mostly left intact.

iamahuman commented 3 years ago

Windows PV Bus Driver from version 9.0.0 (commit 36fae6d3) onward cannot handle grant tables larger than (at most) 2⁹ frames and would fail with STATUS_INSUFFICIENT_RESOURCES.

__FdoAllocateBuffer should not hard-code 2 << 20 as the maximum number of bytes that can be allocated through FdoAllocateHole, which is in turn used by GnttabAcquire.

However, I'm not entirely sure about the fall into a cached MTRR part in the commit message. Isn't MmAllocateContiguousNodeMemory supposed to allocate write-back cached memory anyway, unless PAGE_NOCACHE or PAGE_WRITECOMBINE is specified?

Maybe the driver can use the unpopulated-alloc approach introduced in Linux kernel 5.9, but I'm not sure how I/O reserved space can be allocated in NT, or even how NVDIMM/DAX stuffs are done there.

marmarek commented 3 years ago

Windows PV Bus Driver

I suppose you wanted to ask this on the win-pv-devel ML, not here.

0spinboson commented 3 years ago

I'm not running 4.1 yet but I ran across this ( https://www.suse.com/support/kb/doc/?id=000018854 ) which indicates you can

cat >> /etc/default/grub GRUB_CMDLINE_XEN_DEFAULT="$GRUB_CMDLINE_XEN_DEFAULT conring_size=256k" ctrl-d

Thanks for the sleuthing. note for people like me who like to do silly things like run a dual 4k monitor setup (with a system with 48gb ram and ~10 VMs on at the same time): even 512k is still way too small to capture the whole buffer so you can see the headers, have for now changed it to 1512k so I'll hopefully be able to extract the desired info.

QubesOS / qubes-issues

FS errors/corruption in Qubes R4.1 on low gnttab_max_frames / gnttab_max_maptrack_frames #5674

sys-net domU

dom0