confluentinc / confluent-kafka-go

Confluent's Apache Kafka Golang client
Apache License 2.0
4.64k stars 659 forks source link

Build error with golang:1.20-alpine3.17 platform=linux/arm64 using confluent-kafka-go v2.1.0 #981

Closed everesio closed 5 months ago

everesio commented 1 year ago

Description

ARM64 build using golang:1.20-alpine3.17 fails. AMD64 using confluent-kafka-go v2.1.0 build succeeds. ARM64 and AMD64 with v2.0.2 are also successful.

go mod tidy && go mod vendor
docker buildx build --build-arg TARGETARCH=arm64 .
[+] Building 164.6s (11/11) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                   0.1s
 => => transferring dockerfile: 352B                                                                                                                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/library/golang:1.20-alpine3.17                                                                                                                                                                              0.9s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                                                                                                                          0.0s
 => [1/6] FROM docker.io/library/golang:1.20-alpine3.17@sha256:08e9c086194875334d606765bd60aa064abd3c215abfbcf5737619110d48d114                                                                                                                        0.0s
 => [internal] load build context                                                                                                                                                                                                                      0.4s
 => => transferring context: 104.94MB                                                                                                                                                                                                                  0.3s
 => CACHED [2/6] RUN echo arm64                                                                                                                                                                                                                        0.0s
 => [3/6] RUN apk add alpine-sdk ca-certificates                                                                                                                                                                                                      27.5s
 => [4/6] WORKDIR /code                                                                                                                                                                                                                                0.1s
 => [5/6] ADD . /code                                                                                                                                                                                                                                  0.3s
 => ERROR [6/6] RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=arm64 go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .                                                                                                      135.7s
------
 > [6/6] RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=arm64 go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .:
#0 135.6 # main
#0 135.6 /usr/local/go/pkg/tool/linux_arm64/link: running gcc failed: exit status 1
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_close':
#0 135.6 (.text+0xb4): undefined reference to `sasl_dispose'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_recv':
#0 135.6 (.text+0x1a0): undefined reference to `sasl_client_step'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x1c8): undefined reference to `sasl_errdetail'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x35c): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x38c): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x3ac): undefined reference to `sasl_getprop'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_client_new':
#0 135.6 (.text+0xf74): undefined reference to `sasl_client_new'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0xfd4): undefined reference to `sasl_client_start'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0xff4): undefined reference to `sasl_errdetail'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x110c): undefined reference to `sasl_listmech'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x1180): undefined reference to `sasl_errstring'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /code/vendor/github.com/confluentinc/confluent-kafka-go/v2/kafka/librdkafka_vendor/librdkafka_musl_linux_arm64.a(rdkafka_sasl_cyrus.o): in function `rd_kafka_sasl_cyrus_global_init':
#0 135.6 (.text+0x16dc): undefined reference to `sasl_client_init'
#0 135.6 /usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: (.text+0x170c): undefined reference to `sasl_errstring'
#0 135.6 collect2: error: ld returned 1 exit status
#0 135.6
------
Dockerfile:12
--------------------
  10 |     ADD . "/code"
  11 |
  12 | >>> RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=$TARGETARCH go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .
  13 |
--------------------

How to reproduce

  1. Use consumer example https://github.com/confluentinc/confluent-kafka-go/tree/master/examples/consumer_example
  2. go.mod
module main

go 1.20

require github.com/confluentinc/confluent-kafka-go/v2 v2.1.0
  1. Dockerfile
    
    FROM --platform=linux/$TARGETARCH  golang:1.20-alpine3.17 as builder

ARG TARGETARCH RUN echo $TARGETARCH

RUN apk add alpine-sdk ca-certificates

WORKDIR "/code" ADD . "/code"

RUN CGO_ENABLED=1 GO111MODULE=on GOOS=linux GOARCH=$TARGETARCH go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .


4. Failed build

go mod tidy && go mod vendor docker buildx build --build-arg TARGETARCH=arm64 .

5. Successful build 

go mod tidy && go mod vendor docker buildx build --build-arg TARGETARCH=amd64 .


6. arm64 and amd64 are successful after go.mod dependency is downgraded

require github.com/confluentinc/confluent-kafka-go/v2 v2.0.2



Checklist
=========
Please provide the following information:

 - [x] confluent-kafka-go and librdkafka version (`LibraryVersion()`):
 confluent-kafka-go v2.1.0
flaxinger commented 1 year ago

this needs a bit more attention. wasted too much time on this. 🥲

saranonearth commented 1 year ago

Just try making the following changes


FROM --platform=linux/$TARGETARCH  golang:1.20-alpine3.17 as builder

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update && apk add bash ca-certificates git gcc g++ libc-dev librdkafka-dev pkgconf

WORKDIR "/code"
ADD . "/code"

RUN go build -tags musl -o main .
AndriyKalashnykov commented 1 year ago

Just try making the following changes


FROM --platform=linux/$TARGETARCH  golang:1.20-alpine3.17 as builder

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update && apk add bash ca-certificates git gcc g++ libc-dev librdkafka-dev pkgconf

WORKDIR "/code"
ADD . "/code"

RUN go build -tags musl -o main .

This approach doesn't work with librdkafka-dev v2.3.0, but was working with v2.2.0

kimgr commented 11 months ago

The root cause appears to be that librdkafka now requires Cyrus SASL, but the confluent-kafka-go wrappers don't spell out a link dependency to it.

All the workarounds above seem to avoid solving this problem by instead installing a system librdkafka-dev which requires -tags dynamic per https://github.com/confluentinc/confluent-kafka-go/#librdkafka (not sure why earlier posted workaround examples work without it; we saw linker errors still).

To fix what I understand to be the root cause, we can:

I adapted the repro case from the original report for go1.21 + alpine3.18 with the requisite flags:

FROM --platform=linux/$TARGETARCH golang:1.21.4-alpine3.18

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update
RUN apk add \
    gcc \
    musl-dev \
    # explicitly install SASL package
    cyrus-sasl-dev

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 \
    GO111MODULE=on \
    GOOS=linux \
    GOARCH=$TARGETARCH \
    # explicitly link to libsasl2 installed as part of cyrus-sasl-dev
    CGO_LDFLAGS="-lsasl2" \
    go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .

This works on my arm64/M1 Mac for TARGETARCH of both arm64 and amd64.

kimgr commented 11 months ago

As far as fixing the root cause bug; I'm not sure why there's now a hard link dependency on libsasl2.so. But I see that the Darwin cgo LDFLAGS have -lsasl2 as part of the distribution: https://github.com/confluentinc/confluent-kafka-go/blob/master/kafka/build_darwin_arm64.go#L9. There's probably reasons why this can't work on Linux in general, but it might be a thread to start pulling on.

AndriyKalashnykov commented 11 months ago

The root cause appears to be that librdkafka now requires Cyrus SASL, but the confluent-kafka-go wrappers don't spell out a link dependency to it.

All the workarounds above seem to avoid solving this problem by instead installing a system librdkafka-dev which requires -tags dynamic per https://github.com/confluentinc/confluent-kafka-go/#librdkafka (not sure why earlier posted workaround examples work without it; we saw linker errors still).

To fix what I understand to be the root cause, we can:

  • Ensure cyrus-sasl-dev (for Alpine, see librdkafka sasl docs for other platforms) is installed in the build and run environment
  • Tell cgo to explicitly link libsasl2.so

I adapted the repro case from the original report for go1.21 + alpine3.18 with the requisite flags:

FROM --platform=linux/$TARGETARCH golang:1.21.4-alpine3.18

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update
RUN apk add \
    gcc \
    musl-dev \
    # explicitly install SASL package
    cyrus-sasl-dev

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 \
    GO111MODULE=on \
    GOOS=linux \
    GOARCH=$TARGETARCH \
    # explicitly link to libsasl2 installed as part of cyrus-sasl-dev
    CGO_LDFLAGS="-lsasl2" \
    go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .

This works on my arm64/M1 Mac for TARGETARCH for both arm64 and amd64.

Kim, this is very helpful! Thanks for the research. One may think, why my minimalist example is not a part of Confluent CI/CD pipeline as it can catch breaking changes.

kimgr commented 11 months ago

It turns out the docs at https://github.com/confluentinc/librdkafka/wiki/Using-SASL-with-librdkafka#4-install-sasl-modules-on-client-host say:

Note: librdkafka must be built with SASL support (which is enabled by default if libsasl2-dev is installed at buildtime)

So I think what happened is that @emasab who built librdkafka for 2.3.0 happens to have Cyrus SASL/libsasl2 installed in their environment, and thereby confluent-kafka-go got an indirect dependency on the Cyrus SASL distribution.

I don't know anything about SASL, but it looks like librdkafka has minimal built-in support, so presumably earlier releases happened to build without the Cyrus dependency and only got the base support.

kimgr commented 11 months ago

Followup: we actually ran into a problem with the proposed workaround -- CGO_LDFLAGS are injected before the cgo LDFLAGS, and gcc -l switches are sensitive to order (beautifully described here: https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking).

There's a supremely hacky way to work around this too, using a dangling -Wl,--start-group before -lsasl2;

CGO_LDFLAGS="-Wl,--start-group -lsasl2"

GCC complains with

bin/ld: missing --end-group; added as last command line option

but essentially fixes the unclosed group for you.

kimgr commented 11 months ago

And as a final workaround tip: you can use a more modern linker which doesn't have the input order requirements: lld or mold.

Here's a Dockerfile to use mold

FROM --platform=linux/$TARGETARCH golang:1.21.4-alpine3.18

ARG TARGETARCH
RUN echo $TARGETARCH

RUN apk update
RUN apk add \
    gcc \
    # use mold for convenient extra linker inputs
    mold \
    musl-dev \
    # explicitly install SASL package
    cyrus-sasl-dev

WORKDIR "/code"
ADD . "/code"

RUN CGO_ENABLED=1 \
    GO111MODULE=on \
    GOOS=linux \
    GOARCH=$TARGETARCH \
    # explicitly link to libsasl2 installed as part of cyrus-sasl-dev
    CGO_LDFLAGS="-fuse-ld=mold -lsasl2" \
    go build -mod=vendor -o consumer_example -tags musl -ldflags "-w -s" .

This gets rid of the warning from gcc/ld about the unclosed group.

sagikazarmark commented 9 months ago

@kimgr appreaciate the detailed workarounds.

Unfortunately, the last one does not work for me.

It fails with the following error:

10.59 /usr/local/go/pkg/tool/linux_arm64/link: running aarch64-alpine-linux-musl-clang failed: exit status 1                                                                                                                                                                                                                                                                                                                                                                        
10.59 mold: fatal: library not found: sasl2

I have cyrus-sasl-dev installed.

(An extra piece of information: I use xx to cross-compile which may be an issue here)

151 might also be related

Based on your earlier comment, however, this might be an issue with the bundled libs, so I'm thinking about building them myself, making sure cyrus-sasl-dev is not present.

If that is the problem, then I believe there should be a patch release fixing the libraries.

kimgr commented 9 months ago

@sagikazarmark

I have cyrus-sasl-dev installed.

You mentioned xx. I'm not familiar with it, but I'm assuming you've installed cyrus-sasl-dev using xx-apk in the build context?

https://github.com/tonistiigi/xx?tab=readme-ov-file#xx-apk-xx-apt-xx-apt-get---installing-packages-for-target-architecture

I wonder if a cross linker needs to be used too, or if you can somehow tell mold where to look for libraries for the target architecture.

Sorry, I don't have any clue, really.

emasab commented 7 months ago

Thank you all for raising awareness on this issue.

So I think what happened is that @emasab who built librdkafka for 2.3.0 happens to have Cyrus SASL/libsasl2 installed in their environment, and thereby confluent-kafka-go got an indirect dependency on the Cyrus SASL distribution.

That didn't happen because we configure and build these static binaries in a Semaphore pipeline, not on our laptops. Then we import those binaries locally to push them to confluent-kafka-go.

I believe the issue is here in the release pipeline:

As it should be

                        if attr in a.info and \
                           a.info[attr] == m.attributes[origattr]:

because it's excluding the files the files that have the attribute extra=gssapi. Given it's not excluding them, depending on the order, the version with libsasl2 or the one without it could be taken.

That explains why the issue is present in 2.1.0 and 2.3.0 but not in 2.2.0 and 2.0.2. Going to create a PR to fix it before our upcoming 2.4.0 release.

emasab commented 7 months ago

Then we import those binaries locally to push them to confluent-kafka-go.

There's room for security improvements here. We have to make this step run on CI too.

emasab commented 7 months ago

v2.1.1-linux-arm64-musl isn't affected either. But better to use the workaround at take latest fixes in 2.3.0 at the moment.

emasab commented 7 months ago

Confirmed that the only affected ones are these ones, by looking for rdkafka_sasl_cyrus.o in archive files.

emasab commented 7 months ago

Raised this PR. And confirmed that the produced binaries don't include rdkafka_sasl_cyrus.o, except for darwin where it's expected to have it.

milindl commented 5 months ago

Closing this as it's fixed in 2.4.0