go-debos / debos

Debian OS builder
Apache License 2.0
548 stars 139 forks source link

Native arm64 builds: loop device woes #522

Open jakob-tsd opened 6 days ago

jakob-tsd commented 6 days ago

We run debos on arm64 with --disable-fakemachine on our RK3588-based SBC ( https://embedded.cherry.de/product/jaguar-sbc-rk3588/ ). If relevant, recipes and build script are here: https://git.embedded.cherry.de/debos-recipes.git/tree/

We run 4 builds concurrently and it works quite well, thank you!

However, every 20 builds or so, we get a loop device related failure. Examples:

2024/11/06 14:55:04 apt | Failed to stat /dev/loop0: No such file or directory
2024/11/06 14:55:04 Action `recipe` failed at stage Run, error: exit status 1
2024/10/18 12:00:28 ==== image-partition ====
2024/10/18 12:00:28 parted | Error: Partition(s) on /dev/loop0 are being used.
2024/10/18 12:00:29 Action `image-partition` failed at stage Run, error: exit status 1
2024/10/18 12:00:29 Warning: Failed to get unmount /: device or resource busy
2024/10/18 12:00:29 Unmount failure can cause images being incomplete!

Have you seen something like this already?

jakob-tsd commented 6 days ago

Oh. I think we need this: https://github.com/freddierice/go-losetup/commit/d9566aa43a612d7b37ae0eca9847d830f9d60a93

And also, if Attach() fails, we should retry like losetup does: https://github.com/util-linux/util-linux/blob/4c4b248c68149089c8be2f830214bb2be693307e/sys-utils/losetup.c#L662

obbardc commented 6 days ago

Oh. I think we need this: freddierice/go-losetup@d9566aa

That should be solved with #523 right ?

And also, if Attach() fails, we should retry like losetup does: https://github.com/util-linux/util-linux/blob/4c4b248c68149089c8be2f830214bb2be693307e/sys-utils/losetup.c#L662

We do something similar for closing the loop device, perhaps you could use that as inspiration ? https://github.com/go-debos/debos/blob/main/actions/image_partition_action.go#L668

I am happy to take fixes around this if it helps your usecase, even though --disable-fakemachine really isn't a usecase which debos suggests.

jakob-tsd commented 5 days ago

Yes #523 will pull in the go-losetup fix.

The reason I am using --disable-fakemachine is that I don't have KVM support on the builder right now.