elixir-cloud-aai / proWES

Proxy service for injecting middleware into GA4GH WES requests
Apache License 2.0
3 stars 5 forks source link

fix: broker forces connection closure #89

Closed JaeAeich closed 10 months ago

JaeAeich commented 10 months ago

Describe the bug While looking at docker compose logs I notice below error throughout.

prowes-rabbitmq-1       | 2024-01-10 11:09:53.312203+00:00 [error] <0.1519.0> Error on AMQP connection <0.1519.0> (172.18.0.6:33766 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312203+00:00 [error] <0.1519.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312226+00:00 [error] <0.1440.0> Error on AMQP connection <0.1440.0> (172.18.0.6:33694 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312226+00:00 [error] <0.1440.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312253+00:00 [error] <0.1635.0> Error on AMQP connection <0.1635.0> (172.18.0.5:51722 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312253+00:00 [error] <0.1635.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312182+00:00 [error] <0.1643.0> Error on AMQP connection <0.1643.0> (172.18.0.5:51732 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312182+00:00 [error] <0.1643.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312238+00:00 [error] <0.1532.0> Error on AMQP connection <0.1532.0> (172.18.0.6:33792 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312238+00:00 [error] <0.1532.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312209+00:00 [error] <0.1472.0> Error on AMQP connection <0.1472.0> (172.18.0.6:33738 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312209+00:00 [error] <0.1472.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312128+00:00 [error] <0.1451.0> Error on AMQP connection <0.1451.0> (172.18.0.6:33706 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312128+00:00 [error] <0.1451.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312263+00:00 [error] <0.1475.0> Error on AMQP connection <0.1475.0> (172.18.0.6:33750 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312263+00:00 [error] <0.1475.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312308+00:00 [error] <0.1569.0> Error on AMQP connection <0.1569.0> (172.18.0.6:33822 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312308+00:00 [error] <0.1569.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312287+00:00 [error] <0.1469.0> Error on AMQP connection <0.1469.0> (172.18.0.6:33722 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312287+00:00 [error] <0.1469.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312311+00:00 [error] <0.1660.0> Error on AMQP connection <0.1660.0> (172.18.0.5:51740 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312311+00:00 [error] <0.1660.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312314+00:00 [error] <0.1523.0> Error on AMQP connection <0.1523.0> (172.18.0.6:33780 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312314+00:00 [error] <0.1523.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312308+00:00 [error] <0.1543.0> Error on AMQP connection <0.1543.0> (172.18.0.6:33796 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312308+00:00 [error] <0.1543.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312350+00:00 [error] <0.1488.0> Error on AMQP connection <0.1488.0> (172.18.0.6:33762 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312350+00:00 [error] <0.1488.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312295+00:00 [error] <0.1556.0> Error on AMQP connection <0.1556.0> (172.18.0.6:33806 -> 172.18.0.3:5672, vhost: '/', user: 'guest', state: running), channel 0:
prowes-rabbitmq-1       | 2024-01-10 11:09:53.312295+00:00 [error] <0.1556.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

To Reproduce Steps to reproduce the behavior:

  1. docker compose up --build -d
  2. docker compose logs

Additional context Below is my compose log. dev.txt

uniqueg commented 10 months ago

Hmmm, I can't reproduce this error. The same docker compose up --build -d command on the latest state of the dev branch works perfectly fine for me. All services fire up, no errors. Maybe it was just a temporary problem?

~docker compose logs~

EDIT: Wrong logs uploaded. See comment below for real logs.

JaeAeich commented 10 months ago

latest commit on dev errors

I don't think it is causing any errors or none that I can see (services fire up without errors) and same errors are in your docker compose logs. Maybe its part of the process to force closure, idk. Saw it while looking at #88 so thought I should open an issue.

If it matters which I don't think does, but here are my versions

Docker Compose version v2.21.0
Client: Docker Engine - Community
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.20.10
 Git commit:        afdd53b
 Built:             Thu Oct 26 09:07:58 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.10
  Git commit:       311b9ff
  Built:            Thu Oct 26 09:07:58 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.26
  GitCommit:        3dd1e886e55dd695541fdcd67420c2888645a495
 runc:
  Version:          1.1.10
  GitCommit:        v1.1.10-0-g18a0cb0
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
PRETTY_NAME="Ubuntu 23.10"
NAME="Ubuntu"
VERSION_ID="23.10"
VERSION="23.10 (Mantic Minotaur)"
VERSION_CODENAME=mantic
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=mantic
LOGO=ubuntu-logo
uniqueg commented 10 months ago

Oops, I had uploaded your logs :laughing:

Mine don't have these errors (trying again): docker compose logs

And my Docker versions are almost exactly the same (I have containerd 1.6.25 instead of 1.6.26). And I have Ubuntu 20.04, but I really don't think that should matter...

Can you post the output of docker ps | grep prowes? And can you confirm that you are running docker compose --build -d from branch dev with HEAD commit 25f0e59?

JaeAeich commented 10 months ago

Here are my prowes containers, I spun up after destroying everything just to be sure.

5635ee26dec7   mher/flower:0.9.7       "flower flower --bro…"   56 seconds ago   Up 53 seconds   0.0.0.0:5565->5555/tcp, :::5565->5555/tcp                                                                    prowes-flower-1
44b3d1b475ec   prowes:latest           "bash -c 'cd /app/pr…"   56 seconds ago   Up 53 seconds                                                                                                                prowes-prowes-worker-1
1aa77ec22a64   prowes:latest           "bash -c 'cd /app/pr…"   56 seconds ago   Up 54 seconds   0.0.0.0:8090->8080/tcp, :::8090->8080/tcp                                                                    prowes-prowes-1
0deecc13824b   mongo:3.2               "docker-entrypoint.s…"   2 days ago       Up 54 seconds   0.0.0.0:27027->27017/tcp, :::27027->27017/tcp                                                                prowes-mongodb-1
f32fc571dc2e   rabbitmq:3-management   "docker-entrypoint.s…"   4 days ago       Up 54 seconds   4369/tcp, 5671/tcp, 15671-15672/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:5682->5672/tcp, :::5682->5672/tcp   prowes-rabbitmq-1

And yes I am using the latest commit on dev branch, below is output (git log -n 1)

25f0e59 fix: create storage dir on startup (#83)
commit 25f0e593c956fcd450f0f112c06c62ac768c77fe
Author: Alex Kanitz <alexander.kanitz@alumni.ethz.ch>
Date:   Sun Nov 12 13:53:22 2023 +0100

    fix: create storage dir on startup (#83)

docker-compose-logs-dev-25f0.txt

PS: containers aren't stopping, see below

CONTAINER ID   IMAGE                   COMMAND                  CREATED         STATUS         PORTS                                                                                                        NAMES
5635ee26dec7   mher/flower:0.9.7       "flower flower --bro…"   8 minutes ago   Up 8 minutes   0.0.0.0:5565->5555/tcp, :::5565->5555/tcp                                                                    prowes-flower-1
44b3d1b475ec   prowes:latest           "bash -c 'cd /app/pr…"   8 minutes ago   Up 8 minutes                                                                                                                prowes-prowes-worker-1
1aa77ec22a64   prowes:latest           "bash -c 'cd /app/pr…"   8 minutes ago   Up 8 minutes   0.0.0.0:8090->8080/tcp, :::8090->8080/tcp                                                                    prowes-prowes-1
0deecc13824b   mongo:3.2               "docker-entrypoint.s…"   2 days ago      Up 8 minutes   0.0.0.0:27027->27017/tcp, :::27027->27017/tcp                                                                prowes-mongodb-1
f32fc571dc2e   rabbitmq:3-management   "docker-entrypoint.s…"   4 days ago      Up 8 minutes   4369/tcp, 5671/tcp, 15671-15672/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:5682->5672/tcp, :::5682->5672/tcp   prowes-rabbitmq-1
uniqueg commented 10 months ago

Really strange...

If I interpret these errors correctly, the connection to the workers drop. So if you were to send a workflow run request, it would likely not be processed. Can you confirm that?

If, instead, everything works as expected, I guess I will close that issue for now.

JaeAeich commented 10 months ago

I am closing this, by above if you meant GET endpoint are working, then yeah! it returns runs. I am attaching logs from today again, for later if someone want to see. logs-dev-jan14.txt

uniqueg commented 10 months ago

I meant the POST endpoint. It's the only one that relies on async operations leveraged by RabbitMQ and the Celery workers. It is the connection of the two that caused errors in your case, so pretty sure that wouldn't work.

JaeAeich commented 10 months ago

@uniqueg

oh yeah :man_facepalming:, I check my POST for run with,

curl -X POST "http://localhost:8090/ga4gh/wes/v1/runs" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "tags=" \
  -F "workflow_attachment=" \
  -F "workflow_engine_parameters=" \
  -F 'workflow_params={"input":{"class":"File","path":"https://is.muni.cz/pics/design/r/znak_MU.png"}}' \
  -F "workflow_type=CWL" \
  -F "workflow_type_version=v1.0" \
  -F "workflow_url=https://github.com/uniqueg/cwl-example-workflows/blob/master/hashsplitter-workflow.cwl"

and I seem to get an error,

{"message": "No suitable workflow engine known.", "code": "400"}

I thought that this might be my system issue or something that I am facing due to my configs or something. I spun up a VM and tried to do the same there. In the VM (ubuntu - 23.10), I got docker compose logs similar to yours, but it still have some error. cat vm-logs.txt | grep ERROR vm-logs.txt

Even though there are less errors but I still got the same response for the curl (No suitable workflow engine known) command. Apart from that some unusual that I saw was on my host machine in swagger UI, I got different response for GET /runs and diff on the vm.

 jaeaeich@JaeAeich  ~  curl -X GET "http://localhost:8090/ga4gh/wes/v1/runs" -H "accept: application/json"
{
  "next_page_token": "654e0b7fed03b261bc7751f2",
  "runs": [
    {
      "run_id": "PSMBPK",
      "state": "RUNNING"
    },
    {
      "run_id": "ASN905",
      "state": "RUNNING"
    },
    {
      "run_id": "887BSR",
      "state": "RUNNING"
    },
    {
      "run_id": "JHAG1Q",
      "state": "SYSTEM_ERROR"
    },
    {
      "run_id": "9H8ZYP",
      "state": "SYSTEM_ERROR"
    }
  ]
}
jae@jae:~/proWES$ curl -X GET "http://localhost:8090/ga4gh/wes/v1/runs" -H  "accept: application/json"
{
  "next_page_token": "65a4d132af4b5f013526b300",
  "runs": [
    {
      "run_id": "WG0KAH",
      "state": "SYSTEM_ERROR"
    },
    {
      "run_id": "EW6XYQ",
      "state": "SYSTEM_ERROR"
    }
  ]
}

These are responses for host and VM respectively, I don't understand why it was happening so I thought may should have it documented. :+1: