IBM / core-dump-handler

Save core dumps from a Kubernetes Service or RedHat OpenShift to an S3 protocol compatible object store
https://ibm.github.io/core-dump-handler/
MIT License
136 stars 40 forks source link

arm64 builds are missing (and impossible to build) #88

Closed razielgn closed 2 years ago

razielgn commented 2 years ago

I tried to build the image on arm64, but ubi7 does not provide such arch (and won't). Also the crictl binary would need to be pulled for arm64. From what I understood, the two builds are due to libc issues. Would it be a viable solution to build binaries with musl, in order to have just one static binary for both?

No9 commented 2 years ago

Hey @razielgn It would be great to support arm64 and you are correct the builds are due to libc issues. I have also had mysterious issues with musl in previous projects and so it's not something that I would like to pull into this project. Also the builds are currently ran on quay.io which doesn't support arm at the moment.

Do you have a reference project or article that demonstrates multiarch on a public build service and publishing it to a repo so that I can get an idea of what is involved?

razielgn commented 2 years ago

cargo-cross can be used to easily cross-compile for $arch-unknown-linux-musl. I don't see another way to make an arm64 build useful for EKS users and Graviton instances since, as it's mentioned in the docs, they are on RHEL7-ish and we don't have a base image for that.

The usual painpoint in compiling for musl is openssl, but by updating rust-s3 and setting features to use rustls, it'll go fine:

diff --git a/core-dump-agent/Cargo.toml b/core-dump-agent/Cargo.toml
index 07452f5..7bd6276 100644
--- a/core-dump-agent/Cargo.toml
+++ b/core-dump-agent/Cargo.toml
@@ -11,7 +11,7 @@ anyhow = "1.0.40"
 dotenv  = "0.15.0"
 env_logger = "0.9.0"
 log = "0.4.14"
-rust-s3 = { version = "0.27.0"}
+rust-s3 = { version = "0.31.0", default-features = false, features = ["tokio-rustls-tls"] }
 advisory-lock = "0.3.0"
 tokio-cron-scheduler = "0.4.0"
 tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
diff --git a/core-dump-agent/src/main.rs b/core-dump-agent/src/main.rs
index 2df2a36..5a1f3ba 100644
--- a/core-dump-agent/src/main.rs
+++ b/core-dump-agent/src/main.rs
@@ -378,7 +378,8 @@ fn get_bucket() -> Result<Bucket, anyhow::Error> {
         bucket: s3_bucket_name,
         location_supported: false,
     };
-    Bucket::new_with_path_style(&s3.bucket, s3.region, s3.credentials)
+
+    Ok(Bucket::new(&s3.bucket, s3.region, s3.credentials)?.with_path_style())
 }

 async fn run_polling_agent(core_location: &str) {

You can try with these steps:

cargo install cross
cross build --target x86_64-unknown-linux-musl
cross build --target aarch64-unknown-linux-musl
No9 commented 2 years ago

Thanks for this - If I can parameterise features = ["tokio-rustls-tls"] } my initial thought is to release the musl builds as a separate image and provide docs on usage. Basically create a new Dockerfile for this instance so we don't have to worry about the ubi7 problem.

From what I can see Ok(Bucket::new(&s3.bucket, s3.region, s3.credentials)?.with_path_style()) is an API change in S3 so we will need to bump that anyway.

razielgn commented 2 years ago

Thanks for this - If I can parameterise features = ["tokio-rustls-tls"] } my initial thought is to release the musl builds as a separate image and provide docs on usage. Basically create a new Dockerfile for this instance so we don't have to worry about the ubi7 problem.

Yes I think it's totally doable. I think overall switching to rustls would be a good move, it's been vetted and considered production ready, might save some megabytes in the final image.

From what I can see Ok(Bucket::new(&s3.bucket, s3.region, s3.credentials)?.with_path_style()) is an API change in S3 so we will need to bump that anyway.

Yep.

razielgn commented 2 years ago

Let me know if I can take up some tasks to make this happen.

No9 commented 2 years ago

OK So I'm happy to take a PR where rust-s3 = { version = "0.31.0", default-features = false, features = ["tokio-rustls-tls"] } Is a platform specific dependency defined for x86_64-unknown-linux-musl and aarch64-unknown-linux-musl as outlined here https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#platform-specific-dependencies

The code change obviously doesn't need to be versioned as the main project should also be bumped to rust-s3 = { version = "0.31.0"

I think a new musl.Dockerfile to define the image that uses a suitable base image (ubi8?) and the crossbuild command you suggested

cross build --target x86_64-unknown-linux-musl
cross build --target aarch64-unknown-linux-musl

I am still not 100% on how we can run the new Dockerfile to generate the final aarch64 image in a CI solution though. My initial thoughts here are to use a GH Action https://github.com/docker/build-push-action/blob/master/docs/advanced/multi-platform.md along side this action to push to quay.io https://github.com/marketplace/actions/push-to-registry#:~:text=Push%2Dto%2Dregistry%20is%20a,podman%20to%20perform%20the%20push.

If we are in agreement on this I will setup the secrets so we can validate

No9 commented 2 years ago

@razielgn don't know if you've started looking at this yet but I'm going to start this weekend if I get anywhere the branch will be arm-build

No9 commented 2 years ago

OK I have built the musl images here https://hub.docker.com/repository/registry-1.docker.io/number9/core-dump-handler-musl/tags?page=1&ordering=last_updated with arm64 and amd64 support. I used the multiplatform GH Action and created a separate Dockerfile Quay.io support for arm isn't clear at the moment so I will investigate further but any feedback on the arm64 image in the docker repo would be greatly appreciated.

No9 commented 2 years ago

https://quay.io/repository/icdh/core-dump-handler-musl?tab=info Also committed to the quay repo This will be my preferred option if the work ok

No9 commented 2 years ago

Musl builds are now available as part of the 8.5.0 release https://github.com/IBM/core-dump-handler/blob/main/charts/core-dump-handler/README.md#aws-graviton-and-musl-support If there are any issues then please open a fresh issue.

razielgn commented 2 years ago

Hey, sorry I was on a break. Great job!

If I understand correctly, the arm64 image has a separate tag from the amd64 one, right? That would force setups with etherogeneous architectures to have different daemonsets, since the image tag needs to be specified separately. Image repositories have the ability of pushing different archs separately and then merge them using a manifest. In short, as an example, image:2.5.0-amd64 and image:2.5.0-arm64 can be later merged into image:2.5.0, so that the image puller will pick the right arch for the host. You can see an example here: https://github.com/razielgn/github-exporter/blob/master/.github/workflows/build.yml#L114-L139

No9 commented 2 years ago

Hey @razielgn Hope you enjoyed the break. I decided to continue working as there were a number of PRs piling up and I had a feeling this might take a couple of iterations.

You're correct, the images shouldn't have a separate tag and there should be multiple archs available under the same version just as you describe. i.e. quay.io/icdh/core-dump-handler-musl:v8.5.0 should work for amd64 and arm64 However when I look at the manifest it would appear as though the action is clobbering the amd64 with the arm64

docker manifest inspect --verbose quay.io/icdh/core-dump-handler-musl:v8.5.0

{
    "Ref": "quay.io/icdh/core-dump-handler-musl:v8.5.0",
    "Descriptor": {
        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
        "digest": "sha256:7462b10edf3d2c2f2f1d393f232e41e3b6cc57c67b1fbab7c2b2ed71db49c982",
        "size": 593,
        "platform": {
            "architecture": "arm64",
            "os": "linux"
        }
    },
    "SchemaV2Manifest": {
        "schemaVersion": 2,
        "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
        "config": {
            "mediaType": "application/vnd.docker.container.image.v1+json",
            "size": 2943,
            "digest": "sha256:94880fe0da529f55aaadfc050c74835c9390e94ebc51944d3a60f755ef6f860d"
        },
        "layers": [
            {
                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                "size": 2792166,
                "digest": "sha256:0c49222373e843e1761c0207ad1fc9969dedfd044e8fd420de9abea5ebbff450"
            },
            {
                "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
                "size": 63752419,
                "digest": "sha256:c4a43020b281da9c96fc26c4dadc1ca41a5b49b7e46271741aa17d89b74c48d1"
            }
        ]
    }
}

This is likely a bug in the action documentation as I took the config direct from the example. https://github.com/redhat-actions/push-to-registry/blob/main/.github/workflows/manifest-build-push.yaml

Looking at your example I like the way it roles the archs up explicitly I'll try and merge the two approaches.

No9 commented 2 years ago

So @razielgn I have released 8.5.1 here based on the manifest creation approach that you provided https://github.com/IBM/core-dump-handler/blob/main/.github/workflows/buildmusl.yaml#L63-L68 The manifest seems correct and quay recognises that the arch specific images are children of the main v8.5.1 image so I think should resolve the issue you raised in the previous comment. If it does then we can close this.

razielgn commented 2 years ago

@No9 fantastic, thank you so much! I'm going to test it later this week.