crc-org / vfkit

Apache License 2.0
119 stars 23 forks source link

virtio: Enable full disk caching #76

Closed cgwalters closed 7 months ago

cgwalters commented 8 months ago

We're seeing highly reliable disk corruption in podman machine with the default configuration, and this fixes it for me.

This looks like the same thing as https://github.com/lima-vm/lima/commit/488c95c41c67ae0c3afa8f35173a6e1ef59d29ef

openshift-ci[bot] commented 8 months ago

Hi @cgwalters. Thanks for your PR.

I'm waiting for a crc-org member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
cgwalters commented 7 months ago

We had an in-person chat on this, and I can definitely say that I can't reproduce any corruption with this change. I tried playing with some I/O stressing etc. and things seemed fine. There's a lot of discussion related to this in https://github.com/utmapp/UTM/issues/4840 btw.

gbraad commented 7 months ago

Thanks. We also spoke with Sergio Lopez and got confirmed that this is most likely caused by caching/not flushing in time. We had several reports that this works for them.

openshift-ci[bot] commented 7 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gbraad Once this PR has been reviewed and has the lgtm label, please assign baude for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/crc-org/vfkit/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
gbraad commented 7 months ago

/ok-to-test /lgtm

cfergeau commented 7 months ago

Was wondering about performance impact, but tests in https://github.com/lima-vm/lima/pull/2026#issuecomment-1834136049 say there is no impact.

cfergeau commented 7 months ago

https://github.com/lima-vm/lima/issues/1957 https://github.com/utmapp/UTM/issues/4840 contain a lot of useful information.

jorhett commented 7 months ago

This is a very important fix. When can we expect a release?

cfergeau commented 7 months ago

This is a very important fix. When can we expect a release?

I'm aiming to cut a release this week. In the mean time, I've already added the patch to this brew recipe https://github.com/cfergeau/homebrew-crc/blob/main/vfkit.rb

Are you also hitting this bug?

jorhett commented 7 months ago

Are you also hitting this bug?

No, I just recently discovered from a comment on another bug that applehv support was now available in Podman. I was poking around to see why it wasn't announce or visible in the docs (other than a mention of it as a valid provider) and stumbled on this bug. Figured I should wait for this to be in the release before I started recommending that our engineers give this a try. If it's already in the brew recipe, that may suffice.

What's your feeling about stability versus Qemu? Would you turn a few hundred engineers loose on this?

cfergeau commented 7 months ago

What's your feeling about stability versus Qemu? Would you turn a few hundred engineers loose on this?

podman's applehv support is still being worked on, which is why it's not on by default ;) There's also https://github.com/containers/gvisor-tap-vsock/pull/309 which is being fixed, which cause failures in some cases on podman machine start

gbraad commented 7 months ago

I wanted to add a comment in answer to:

just recently discovered from a comment on another bug that applehv support was now available in Podman

but saw Christophe already did. Some are still fixing some race conditions with applehv, as the implementations differs slightly with CRC. We hope to resolve this soon and converge; just note that this also needs a gvproxy release, which is more closely coordinated with Podman's release schedule.