aws / amazon-cloudwatch-agent

CloudWatch Agent enables you to collect and export host-level metrics and logs on instances running Linux or Windows server.
MIT License
435 stars 196 forks source link

Incorrect upper limit on golang version in Amazon Linux 2 rpm spec for v1.247352.0 #524

Closed lorengordon closed 1 year ago

lorengordon commented 2 years ago

Describe the bug I was building an rpm for the Cloudwatch Agent, using the spec from the srpm in Amazon Linux 2, and found that the spec file is setting an upper limit of golang that requires < 1.16.0 to build the rpm. That seems wrong, since this repo is declaring go 1.18 in go.mod.

Steps to reproduce

# yumdownloader --destdir . --source amazon-cloudwatch-agent
# rpm2cpio amazon-cloudwatch-agent-1.247352.0-1.amzn2.src.rpm | cpio -idv
# grep BuildRequires amazon-cloudwatch-agent.spec
BuildRequires: golang >= 1.13.0, golang < 1.16.0

What did you expect to see? I expected to be able to build the rpm with a newer version of golang. When I built v1.247350.0 previously, it worked fine. And I checked the prior srpm and it does not specify the upper limit in BuildRequires. So this is new to the spec for v1.247352.0.

SaxyPandaBear commented 2 years ago

I don't want to get into the weeds of how the sausage gets made, but to give some insight - at the time that we cut 1.247352.0 for release, the CloudWatch agent could not be built on anything past 1.16, but not putting the upper bound would have it attempt to build on 1.17 (thus blocking release). We've since upgraded our dependencies and moved to 1.18 to stay on a supported version of Go.

lorengordon commented 2 years ago

Strange. I am pretty sure I built it previously using 1.16 and it was fine. I'm building on/for RHEL, so I just installed golang from EPEL. It was 1.16 until recently. It is now 1.17.

Edit: No, checked the build logs. As recently as 5 July, it was golang 1.17 in EPEL and was still building v1.247350.0 fine.

SaxyPandaBear commented 2 years ago

Building on 1.16 would be fine. The problem is outdated dependencies that the agent uses under-the-hood would not build on Go 1.17, so we were stuck juggling releasing v352 with ripping everything out and making it all work on Go >= 1.17. Apologies for the inconvenience. Just wanted to shed some light on why the upper limit was imposed. IIRC we removed it after that for 1.247353.0 since we jumped to Go 1.18 by then.

lorengordon commented 2 years ago

Alright, I can wait for 1.247353.0 to show up in Amazon Linux 2 and retry. Any idea when the new version will get published as an rpm to the Amazon Linux 2 repos?

SaxyPandaBear commented 2 years ago

I think mid-August is when you should expect a new version to be out, though that RPM is actually for one version further, v1.247354.0, just so you don't get blindsided by it when that happens - no go mod changes in that one.

lorengordon commented 2 years ago

So I saw the tag hit for 1.247354 (and then 1.247355), but I'm not seeing a GitHub Release for either. Is that normal? I also noticed a new version of the Amazon Linux 2 docker image landed a few days ago, so figured I'd give the build another try, but it appears it is still pulling 1.247352.0?

SaxyPandaBear commented 2 years ago

I think we've just been slacking on cutting proper GitHub releases.. I'm a little surprised about the v352 release being "new" for the Docker image though. Where did you pull it from? We definitely should have published a v354 image. Peeking behind the curtain a little, part of our release is updating the container insights repo, and then publishing it + running validation on it so I'm like fairly certain that it should be out.

lorengordon commented 2 years ago

We spin up the Amazon Linux 2 container, then use yumdownloader to grab the source rpm for each of the tools we want to rebuild. We are not pinning the version, so it should just be grabbing the latest. Here's the error from rpmbuild:

error: Failed build dependencies:
    golang < 1.16.0 is needed by amazon-cloudwatch-agent-1.247352.0-1.el7.x86_64
SaxyPandaBear commented 2 years ago
docker pull amazon/cloudwatch-agent:1.247354.0b25198101
Error response from daemon: manifest for amazon/cloudwatch-agent:1.247354.0b25198101 not found: manifest unknown: manifest unknown

Huh. Well I guess that's worth looking into. I'm not sure how I didn't see an error during release.

SaxyPandaBear commented 2 years ago

Oh are you pulling from the yum repository? That's like a whole separate thing.

lorengordon commented 2 years ago

Oh are you pulling from the yum repository? That's like a whole separate thing.

Yeah, we're rebuilding several packages. So that seemed like the easiest and most standard way to get everything. And we just trigger builds on when the Amazon Linux 2 image is updated. Instead of having custom build logic per package...

SaxyPandaBear commented 2 years ago

Yeah that makes sense. I think we're missing the last two releases in Amazon Linux 2. I'll follow up on that.

As for the Docker pull, I had extra chars at the end which explain that:

docker pull amazon/cloudwatch-agent:1.247354.0b251981
1.247354.0b251981: Pulling from amazon/cloudwatch-agent
d875800c7401: Pull complete 
265f36118970: Pull complete 
a91d9a823c97: Pull complete 
Digest: sha256:33f0072c93d614b5dd32f044549f3d764d05a42f068e852e94bdd849098852c7
Status: Downloaded newer image for amazon/cloudwatch-agent:1.247354.0b251981
docker.io/amazon/cloudwatch-agent:1.247354.0b251981

the newest image for v354 does exist in DockerHub / Public ECR.

Now, as for the v355 tag that you noticed, what happens is we tag it on GitHub, and then start the release process so you should probably expect a v355 to be released to S3/Docker in the next few weeks. But publishing to Amazon Linux's YUM repo is not as simple and not entirely controlled by us.

As a possible workaround, are you able to pull an older version of Golang so that the image build works for 352?

lorengordon commented 2 years ago

As a possible workaround, are you able to pull an older version of Golang so that the image build works for 352?

Yess-ish... Right now we're just using yum install golang to install from epel, and the yum repo does not provide older versions. So we'd have to change that install mechanism to something that lets us specify the version... What I've done before to make that super easy is just a multi-stage docker build and copy over /usr/local/go and /go...

But of course that might cause other problems, for other packages... Ick. Might end up needing per-package logic after all. 😭

SaxyPandaBear commented 2 years ago

Sorry for the trouble this is causing you. v354 should be staged for the next release of Amazon Linux 2 (missed the last cutoff by like 2 days unfortunately), so I think early September is when this should be resolved for you

lorengordon commented 2 years ago

Thanks for all the follow-up and communication! Really appreciate it!

SaxyPandaBear commented 2 years ago

Just so you know that I haven't forgotten :) I just checked and it hasn't rolled out yet. sudo yum install amazon-cloudwatch-agent still installs v352 right now. I will check in on it again next week. Sorry for the delay. AL2 updates aren't controlled by our team

lorengordon commented 2 years ago

Haha just yesterday I reran the build to see if it would pass yet (it didn't). Thanks for the confirmation!

SaxyPandaBear commented 1 year ago

I have been swamped so I forgot to check until now. It's definitely updated.

{
  "status": "running",
  "starttime": "2022-09-20T19:51:26+0000",
  "configstatus": "configured",
  "cwoc_status": "stopped",
  "cwoc_starttime": "",
  "cwoc_configstatus": "not configured",
  "version": "1.247354.0b251981"
}
lorengordon commented 1 year ago

welp, yes, that part is good now. just instead of success, the error has changed :sob: :

#0 0.061 ~/rpmbuild/SOURCES ~/rpmbuild
#0 0.094 amazon-cloudwatch-agent.spec
#23 0.317 amazon-cloudwatch-agent.tar.gz
#23 0.317 62666 blocks
#23 0.319 ~/rpmbuild
#23 0.384 error: Failed build dependencies:
#23 0.384   golang >= 1.18.3 is needed by amazon-cloudwatch-agent-1.247354.0b251981-1.el7.x86_64

unfortunately epel7 has an older version of golang, 1.17.12:

#19 37.11 ---> Package golang.x86_64 0:1.17.12-1.el7 will be installed

so guess i really am going to have to update this process to track a specific golang version...

SaxyPandaBear commented 1 year ago

Sigh.. sorry about that. We jumped up to Go 1.18 to catch up with telegraf, so not much we can do on that front.

lorengordon commented 1 year ago

Ok, well now we're in new territory. I've gone ahead and installed a specific version of golang directly and made sure it is in the PATH:

#25 [builder  8/21] RUN go version
#0 0.078 go version go1.19.1 linux/amd64

Beautiful.

But still:

#29 0.510 error: Failed build dependencies:
#29 0.510   golang >= 1.18.3 is needed by amazon-cloudwatch-agent-1.247354.0b251981-1.el7.x86_64

So. I assume the spec file is still using BuildRequires, and that is actually looking for an installed rpm of the correct name and version, instead of just using the binary available in the PATH? (My install was just grabbing the docker image for golang, copying over the bits to my image, and updating PATH and GOPATH. Which should be enough imo.)

lorengordon commented 1 year ago

Maybe it is hammer time. Yolo right?

sed -i "/BuildRequires: golang/d" "$SPEC"
lorengordon commented 1 year ago

Ok I think I'm unblocked on this now. I'll go ahead and close it. Not sure if you want to take for action some kind of improvement to the spec-file to make it possible to build the rpm without an rpm-packaged golang install. Appreciate all your support and communication, thanks!