contribsys / faktory

Language-agnostic persistent background job server
https://contribsys.com/faktory/
Other
5.76k stars 229 forks source link

Faktory Ent 1.5.2 on ECS Fargate is failing to start #369

Closed motymichaely closed 3 years ago

motymichaely commented 3 years ago

faktory-ent 1.5.2

standard_init_linux.go:219: exec user process caused: exec format error

Are you using an old version?

No.

Have you checked the changelogs to see if your issue has been fixed in a later version?

Yes.

https://github.com/contribsys/faktory/blob/master/Changes.md https://github.com/contribsys/faktory/blob/master/Pro-Changes.md https://github.com/contribsys/faktory/blob/master/Ent-Changes.md

This might be related to https://github.com/contribsys/faktory/commit/c5f5c006bbe195118f3a50fc029e5602779635f3 and the move to multi-architecture containers.

mperham commented 3 years ago

Thanks for the report. It looks like the fent Docker build was using the previous push command to push images to the registry whereas cross-platform images require buildx --push. I've re-pushed the images. Fetch 1.5.2 again and see if it works better now.

OhadArzouan commented 3 years ago

Hey Mike, jumping on this thread after I tested the latest patch to 1.5.2 (I'm Moty's colleague :) ) Though the initial issue seems to be resolved now, I get the following error when the container is starting on a linux machine:

/faktory: line 1: ����: not found
/faktory: line 2: syntax error: unexpected ")"

Have you experienced this issue before? Thanks in advanced!

mperham commented 3 years ago

My guess is that the ARM image has an x86 binary for /faktory. I've rebuilt it from scratch. Try again.

OhadArzouan commented 3 years ago

Sorry to bug you again Mike. Have you used buildx? I see the initial error again exec user process caused: exec format error 😬

mperham commented 3 years ago

Please bug me. This needs to be fixed.

Here's the exact build steps I'm running:

GOOS=linux GOARCH=arm64 go build -o faktory cmd/daemon/main.go
docker buildx build --platform linux/arm64 --tag contribsys/$(NAME):$(VERSION) --build-arg=FAKTORY_VERSION=$(VERSION) --load .
docker tag contribsys/$(NAME):$(VERSION) host.docker.internal:9999/contribsys/$(NAME):$(VERSION)
# Need to specialize the Docker builder to support insecure registries as we push over SSH
# docker buildx create --config=./buildx_registry.toml --name focused_saha
docker buildx --builder focused_saha build --platform linux/arm64 -t host.docker.internal:9999/contribsys/$(NAME):$(VERSION) --push .

The core issue is that I have no way of running the ARM images so I can't test them. I might have to spin up an ARM instance and manually work thru this.

mperham commented 3 years ago

I thought you were using ARM machines? Are you using x86?

OhadArzouan commented 3 years ago

Sorry for the confusion! Yes, we are using x86....

motymichaely commented 3 years ago

@mperham - Any updates on this?

mperham commented 3 years ago

It’s my highest priority issue right now. I will be working on it all day tomorrow.

OhadArzouan commented 3 years ago

Maybe this will be useful - we tested a few flows here and while pulling from public faktory docker seems to support both arm and amd platforms (tested by pulling the public docker either with --platform linux/arm64 or --platform linux/amd64 and then running docker image inpect), for some reason dokcer-ent images don't support both platforms. I tried to pull it just now for arm docker pull docker.contribsys.com/contribsys/faktory-ent:1.5.2 --platform linux/arm64 but when I run docker image inspect I see that the architecture I received was arm64.

Thinking back at everything that happened yesterday (though I can't really prove it atm :) ) I believe that initially the docker image we tried was arm (which failed to execute) then after you rebuild, maybe, we did pull the amd image (which had the syntax error issue) and then after the third time the docker image was build, we pull the image for arm regardless to which platform we pass in the platform param.

Hopes that it makes sense and helps.

mperham commented 3 years ago

Ok, it helps a lot to know that the public contribsys/faktory image is working correctly. I've pushed a new image for faktory-ent, let me know if it works for you.

mperham commented 3 years ago
❯ docker manifest inspect docker.contribsys.com/contribsys/faktory-ent:1.5.2
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1571,
         "digest": "sha256:e5112a8f184075e75cf7827789e966e591b8bde8020e10521a8b207966afd6ab",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1571,
         "digest": "sha256:8793d39ca1243f8585ee4cf260840a4f7c56d6a740ea05e17febb47165730d37",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}
OhadArzouan commented 3 years ago

I was able to pull the amd64 image and looks like it's running successfully on our server 🎉 One little thing though, even though I pulled version 1.5.2 both the log and the UI are printing version number 1.6.0.

Faktory Enterprise 1.6.0
© 2021 Contributed Systems LLC.
.
.
.

Thanks for taking care of it!

mperham commented 3 years ago

Yeah, that’s a bug, just ignore it. I bumped the master version before realizing I’d have to fix this packaging issue.

On Wed, Jul 14, 2021 at 14:38 Ohad Arzouan @.***> wrote:

I was able to pull the amd64 image and looks like it's running successfully on our server 🎉 One little thing though, even though I pulled version 1.5.2 both the log and the UI are printing version number 1.6.0.

Faktory Enterprise 1.6.0

© 2021 Contributed Systems LLC.

.

.

.

Thanks for taking care of it!

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/contribsys/faktory/issues/369#issuecomment-880227564, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAWX4JWWOCLITQWZZX6KLTXX7VJANCNFSM5AJNTJXA .

motymichaely commented 3 years ago

@mperham Thanks a bunch for the quick turnaround on this.

OhadArzouan commented 3 years ago

qq - we get many Bad connection errors. This isn't new really but I was wondering maybe you are familiar with this issue

Bad connection: read tcp XX.X.X.XXX:7419->XX.X.X.XXX:25831: read: connection reset by peer

Thanks

mperham commented 3 years ago

Many people have seen this error. It's usually an issue with Kubernetes or Docker networking. I don't recall the exact cause or solution.