Copying my comments motivating this approach from Discord for posterity:
we can make a slight modification to the HLSL conversion shader to never consider data from row (i+1) when converting row i
(this can only happen for the alpha channel of the rightmost pixel on a row anyway)
so this would have the effect of making the height of the output texture be the same as the input, but the width be round(inputWidth * 3 / 4, 4) / 4
and we could perform a single CopySubResource per damage rect as we have before
and should not have any implications for the guest, they're still seeing the packed data they expected to see
we only need to do the 1 CopySubResource per row of damage rect thing if we allow the alpha channel of the last pixel on row i to encode the red channel of the first pixel on row (i+1)
by not allowing that we waste like 2 KiB per frame, but win the ability to use damage rects so seems worth it
Copying my comments motivating this approach from Discord for posterity: