Open joshuabaird opened 1 month ago
@benjaminhuo @wenchajun Can someone please review this?
same error on fluent/fluent-operator/fluentd:v2.8.0
I think that in general each new image should have a new tag (does not apply to floating tags, like "latest").
I would agree. Folks rely on versioned tags for stability and they should be immutable. If these images are going to be rebuilt for whatever reason, perhaps an internal "patch" version should be added (eg, v1.15.3.x
).
I would agree. Folks rely on versioned tags for stability and they should be immutable. If these images are going to be rebuilt for whatever reason, perhaps an internal "patch" version should be added (eg,
v1.15.3.x
).
This might be related to the CI changes we made recently, cc @sarathchandra24
https://github.com/fluent/fluent-operator/pull/1183 https://github.com/fluent/fluent-operator/pull/1079
I also remember there is an PR for a similar issue from @sarathchandra24 https://github.com/fluent/fluent-operator/pull/1093
Would you help to take a look? @sarathchandra24
Thanks
I've built fluentd v1.17.0 image
@benjaminhuo It looks like the 1.17.0
image has the same bug for x86_64 images. Is this expected?
Sorry for the late response everyone, I realized the problem after running it locally.
Root cause is defaultBinPath on main.go#L22 for amd64 it is"/usr/bin/fluentd" and for arm64 it is "/usr/local/bundle/bin/fluentd".
Creating a PR for logic to choose path based on arch.
Sorry for the late response everyone, I realized the problem after running it locally.
Root cause is defaultBinPath on main.go#L22 for amd64 it is"/usr/bin/fluentd" and for arm64 it is "/usr/local/bundle/bin/fluentd".
Creating a PR for logic to choose path based on arch.
Thank you very much @sarathchandra24
both 1.15.3 and 1.17.0 are updated, would you try again? @joshuabaird
@benjaminhuo @sarathchandra24 The bug is still present inkubesphere/fluentd:1.17.0@sha256:bc06e880c224e76e659bf59250e5302ad159ee6b5474a2c5ee45f3a0969644c5
:
fluentd-1 fluentd level=error msg="start Fluentd error" error="fork/exec /usr/local/bundle/bin/fluentd: no such file or directory"
fluentd-1 fluentd level=info msg=backoff delay=4s
It looks like the v1.15.3
image is still broken as well.
@joshuabaird Can I know what OS are you using.
Also, I think there is something wrong with the builds or build system.
You see the message
level=info msg="Current architecture" arch=amd64
Also for
docker run sarathchandra24/fluentd-arm:local-arm1
You see the message
level=info msg="Current architecture" arch=arm64
But this is not the case while running
docker run kubesphere/fluentd:1.17.0@sha256:bc06e880c224e76e659bf59250e5302ad159ee6b5474a2c5ee45f3a0969644c5
@joshuabaird Can you please run
docker run ghcr.io/fluent/fluent-operator/fluentd:v1.17@sha256:095572fbf94ee3bbd01c0597b7b8a113c647e64ad2c53457c9c561432207f99d
and
docker run ghcr.io/fluent/fluent-operator/fluentd:v1.17@sha256:baac1724e2277baf50817d2612f06f0bf3b9050a77e1f7b78d351386b84541b7
To check if GitHub images are working
After inspecting images on GitHub
running: docker run ghcr.io/fluent/fluent-operator/fluentd:v1.17@sha256:095572fbf94ee3bbd01c0597b7b8a113c647e64ad2c53457c9c561432207f99d
We can see the message level=info msg="Current architecture" arch=amd64
running: docker run ghcr.io/fluent/fluent-operator/fluentd:v1.17@sha256:baac1724e2277baf50817d2612f06f0bf3b9050a77e1f7b78d351386b84541b7
We can see the message level=info msg="Current architecture" arch=arm64
@sarathchandra24 Yeah - I'm not seeing the log statements on the images in Dockerhub. The images on Github do appear to be working as expected (I see the log statements).
We may have a CI problem with copying from Github to Dockerhub. I'll take a look at the CI runs and see if I can spot anything.
@benjaminhuo Also, just noticed that the fluentbit images aren't available in Github (ghcr.io) -- so we probably need to manually run the CI job that pushes them.
It also looks like the v1.17.0
linux/amd64
image on GHCR is actually 1.15.3:
❯ docker run --platform linux/amd64 ghcr.io/fluent/fluent-operator/fluentd:v1.17.0
level=info msg="Current architecture" arch=amd64
level=info msg="Fluentd started"
2024-06-05 16:03:02 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-06-05 16:03:02 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-06-05 16:03:02 +0000 [info]: gem 'fluentd' version '1.15.3'
...
2024-06-05 16:03:02 +0000 [info]: starting fluentd-1.15.3 pid=13 ruby="3.2.4"
2024-06-05 16:03:02 +0000 [info]: spawn command to main: cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"]
2024-06-05 16:03:02 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
The linux/arm64
image however is actually v1.17.0:
❯ docker run --platform linux/arm64 ghcr.io/fluent/fluent-operator/fluentd:v1.17.0
Unable to find image 'ghcr.io/fluent/fluent-operator/fluentd:v1.17.0' locally
v1.17.0: Pulling from fluent/fluent-operator/fluentd
Digest: sha256:4651f4340241b53534c5b481422082d9e785e4f9e86cd2d027a51f61e521fe2e
Status: Downloaded newer image for ghcr.io/fluent/fluent-operator/fluentd:v1.17.0
level=info msg="Current architecture" arch=arm64
level=info msg="Fluentd started"
2024-06-05 16:04:25 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-06-05 16:04:25 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2024-06-05 16:04:25 +0000 [info]: gem 'fluentd' version '1.17.0'
...
2024-06-05 16:04:25 +0000 [info]: starting fluentd-1.17.0 pid=14 ruby="3.3.2"
2024-06-05 16:04:25 +0000 [info]: spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"]
2024-06-05 16:04:25 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-06-05 16:04:25 +0000 [info]: adding match in @FLUENT_LOG pattern="fluent.*" type="null"
2024-06-05 16:04:25 +0000 [info]: #0 starting fluentd worker pid=23 ppid=14 worker=0
2024-06-05 16:04:25 +0000 [info]: #0 fluentd worker is now running worker=0
@joshuabaird I've added you as the maintainer, and you can trigger the image build here:
@benjaminhuo @sarathchandra24 Is the intention to build and maintain both v1.15.3 and v1.17.0 fluentd images? Even if you pass 1.17.0
to the workflow, the Dockerfile
still installs v1.15.3 here:
So, if the intention is to build/maintain both v1.15.3 and 1.17.0, the Dockerfile
will need to be modified.
@benjaminhuo @sarathchandra24 Is the intention to build and maintain both v1.15.3 and v1.17.0 fluentd images? Even if you pass
1.17.0
to the workflow, theDockerfile
still installs v1.15.3 here:So, if the intention is to build/maintain both v1.15.3 and 1.17.0, the
Dockerfile
will need to be modified.
@joshuabaird You're right, the version is hardcoded in dockerfile for fluentd, we need to change that to use new version of fluentd
@benjaminhuo But do we want to continue to support v1.15.3 or just modify the Dockerfiles to use 1.17.0?
@benjaminhuo But do we want to continue to support v1.15.3 or just modify the Dockerfiles to use 1.17.0?
we already have 1.51.3 image built that meets some people's requirement, I think we can move on to the latest version of fluentd, the image can be replaced to a older version if he needs
@benjaminhuo https://github.com/fluent/fluent-operator/pull/1198
@benjaminhuo #1198
@joshuabaird The new fluentd image for 1.17 has been rebuilt after your PR, would you give it a try?
Things are looking good. I'm going to open a PR to update fluentbit and then we'll rebuild the fluentbit images so they get pushed to GHCR.
@benjaminhuo Any idea why fluentd:v2.8.0
and fluent-bit:v2.8.0
tags exist?
This is confusing, because it's the operator tag, not the fluentd/fluent-bit tag. This is causing dependency update apps (like Dependabot/Renovate) to try and update these images.
Should we delete them?
@benjaminhuo Any idea why
fluentd:v2.8.0
andfluent-bit:v2.8.0
tags exist?This is confusing, because it's the operator tag, not the fluentd/fluent-bit tag. This is causing dependency update apps (like Dependabot/Renovate) to try and update these images.
Should we delete them?
![]()
@joshuabaird I can delete them, they're created by wrong CI workflow
image 2.8.0 are all deleted
@benjaminhuo Great, thank you!
We use Fluent-operator version 2.7.0, which uses Fluentd v.1.15.3. Unfortunately, we get the same error now: level=info msg="backoff timer done" actual=16.013265218s expected=16s level=error msg="start Fluentd error" error="fork/exec /usr/local/bundle/bin/fluentd: no such file or directory" level=info msg=backoff delay=32s with the old image as well, which worked fine priorly. What can I do to get Fluentd to start?
@vajgi90 It looks like the amd64 image on Dockerhub for v1.15.3
has the bug. We'll try to get this fixed. Until then, you have two options:
ghcr.io/fluent/fluent-operator/fluentd:v1.15.3
ghcr.io/fluent/fluent-operator/fluentd:v1.17.0
Great, thank you so much for the quick response!
Describe the issue
It seems a recent image was pushed to the
kubesphere/fluentd:1.15.3
tag (docker.io/kubesphere/fluentd@sha256:bc06e880c224e76e659bf59250e5302ad159ee6b5474a2c5ee45f3a0969644c5
) which breaks fluentd:Pinning to a previous SHA fixes the issue --
kubesphere/fluentd:v1.15.3@sha256:794311919658aee8eb9829836cd6c3437dffd9c7112556d5dc2f01ca3fcb826b
.To Reproduce
Repull the
kubesphere/fluentd:1.15.3
latest SHA.Expected behavior
Fluentd should start.
Your Environment
How did you install fluent operator?
No response
Additional context
No response