Open djs55 opened 10 years ago
I simplified this in my head to:
I think that's a reasonable summary. /cc @jonludlam who may also be interested.
One thing I don't quite understand is how suspend/resume works on the same host. Is it expected to reestablish all frontend ring connections just as it does on a live relocation?
Or to put it another way, is suspend/resume just an instance of a "crash" ?
I believe suspend/resume is just like live relocation. Unfortunately suspend is slightly more than a crash because you have to unmap the shared info page, otherwise libxenguest gives you an obscure error "p2m race" IIRC. @jonludlam can explain more (he tried to suspend a crashed domain and hit this problem)
From the xenstore server point of view, as long as we're careful to journal things then the rings will still be sitting there when we come back. However I don't think it's possible to suspend/resume or even reboot the xenstore server domain because all the grant references will have the wrong domid. For the mirage stubdom case a "crash" would have to be handled by an in-domain reboot.
Of course this is all moot since this code never crashes.
That libxenguest thing probably deserves some investigation, since the same race would exist if the domain crashed out (or maybe not since it wouldn't have a suspend image in this case). It's probably only relevant for PV guests. Roll on PVHVM :)
For extreme reliability we want to recover after an arbitrary crash.
The general approach will be to persist all relevant data to an Irminsule database, which will be configured to use a 'git' format on a "persistent" memory area. The use of 'git' will give us some history, to help debug a crash. A "persistent" memory area will either be a tmpfs (for userspace) or a fixed range of memory addresses for Mirage The "persistent" memory area needs to survive a process crash but not a host reboot, since xenstore is cleared on reboot.
Connections over a unix domain socket will be closed on crash, and client processes will be expected to reconnect. Connections over inter-domain shared memory rings will be persistent.
We will not store intermediate transaction state, instead we will artificially abort any uncommitted transaction after a crash.