edgexfoundry / edgex-compose

EdgeX Foundry Docker Compose release compose files and tools for building EdgeX compose files
Apache License 2.0
81 stars 115 forks source link

error: bind: address already in use in make run #430

Open idiazcir opened 5 months ago

idiazcir commented 5 months ago

🐞 Bug Report

Affected Services

The issue is located in: The make run with the ports binding.

Is this a regression?

No ### Description and Minimal Reproduction Not exactly a bug, but rather a problem that we are encountering usually when deploying (`make run`) EdgeX. It is an error **from Docker**. It has been tracked in this issue [here](https://github.com/moby/moby/issues/44136). However, **it can be avoided** by defining the ports for the containers differently. Hence, we consider it could be changed in the code to prevent it from happening until an official fix is done. The issue is caused by a collision of the ports used for the dockers. In some occasions when we run EdgeX via a `make run`, the command will fail due to an error on the ports used, being _already in use_. When that happens, the following error appears. ![Error](https://github.com/edgexfoundry/edgex-compose/assets/148757812/1ab0bbdb-7d31-40f4-b6c4-866747274cd8) ```sh Error response from daemon: driver failed programming external connectivity on endpoint edgex-core-command (e14c889c16b8dd90f49de2b31e8c756dBe3ed6a2e6f30401 dd84d158c3ab83ee): Error starting userland proxy: listen tcp4 127.0.0.1:59882: bind: address already in use make: *** [Makefile: 210: run] Error 1 ``` As said, we belive the error can be traced down to how Docker is using the ports defined on the docker compose file. In there, we have for every module, something similar to this: ```yaml core-command: ports: - mode: ingress host_ip: 127.0.0.1 target: 59882 published: "59882" protocol: tcp ``` This is equivalent to (as in the `docker-compose-base.yml`) ```yaml core-command: ports: - 59882:59882 ``` In this case, the module that failed was the `edgex-core-command`, but it happens for the rest as well. The same Docker problem can be caused if you have a process running on the same port. In our case, this is not the case, there are no processes running in the port when it collides. Therefore, we belive it is related to how internally Docker launches the containers and binds the ports. We have found three ways of solving this. **Solution 1**: Only choose the internal port. Let Docker choose the external one. ```yaml # From this: core-command: ports: - 59882:59882 # To this: core-command: ports: - 59882 ``` This prevents the collision, however, it causes the external port to be randomly chosen by Docker, thus making it hard the be used for applications with an exposed interface. Based on [this](https://github.com/docker/compose/issues/4950#issuecomment-342882168) issue answer. **Solution 2**: Choose both ports, but different ones. ```yaml # From this: core-command: ports: - 59882:59882 # To this: core-command: ports: - 59881:59882 ``` In the same way, setting the two ports but with different values seems to work fine (avoids collisions). We believe this is not a large change. As of our understanding, as long as the internal ports remain the same, changing the external ones should not be too complex. Based on [this](https://stackoverflow.com/questions/70797971/docker-error-response-from-daemon-ports-are-not-available-listen-tcp-0-0-0-0?noredirect=1&lq=1#:~:text=Then%20you%20just%20bind%20another%20port) (we used a better resource, but couldn't find the link). **Solution 3**: Remove the ports if not needed. ```yaml # From this: core-command: ports: - 59882:59882 # To this: core-command: ``` This solution is the best one if the service doesn't have an exposed interface. Internally everything will keep running fine as the containers are in the same network. The only thing is that they won't be exposed to the exterior. As of our understanding, aside from for testing purposes, this can be done for all modules except **consul**, **vault**, and the **ui**. The rest as they don't expose an extrenal interface, the ports can be removed.
This has been tested for our own services, and works fine. We have removed all the ports for services that don't expose an interface (solution 3) and put two different ports for those that do (solution 2). We, however, still have collisions in the core modules of EdgeX as this remains the same there. We do not know if this problem is something that happens on some occasions to you or other EdgeX users. In our case, it happens sometimes, and it is quite frustrating as we need to bring down and re-run the containers until it works (and seamlessly there are no bind collisions). As already mentioned, we have managed to reduce the appearance of this error by implementing these changes in our own services. The `make run` will still occasionaly fail on the core modules binds. ## 🔥 Exception or Error

$ make run
Error response from daemon: driver failed programming external connectivity on endpoint edgex-core-command (e14c889c16b8dd90f49de2b31e8c756dBe3ed6a2e6f30401
dd84d158c3ab83ee): Error starting userland proxy: listen tcp4 127.0.0.1:59882: bind: address already in use
make: *** [Makefile: 210: run] Error 1
## 🌍 Your Environment **Deployment Environment:** Ubuntu 20.04 **EdgeX Version:** 3.0 **Anything else relevant?** While we are currently using version v3.0, version v3.1 has ports defined in the same way, so the error will persist. The original [issue](https://github.com/moby/moby/issues/44136) being tracked in Docker has not been added to v25.0 as the author suggested may be. As said, this is not the fault of EdgeX, and we know if this is something extended, or if it only happens to us. In any case, we hope this is useful.
cloudxxx8 commented 5 months ago

what is your environment and recreate steps? it looks like a simple port conflict issue to me. Are you running docker container during executing make run? Could you please check if there is any service running to occuipie the port 59882 in your system? If you purely run make run, there is nothing related to Docker.

idiazcir commented 5 months ago

The environment is Ubuntu 20.04 in amd64, with Edgex version 3.0 and Docker 25.0.1. The problem is that we haven't found a way to properly recreate it. It just pops up. With the same setup, doing a make run will work, but then after a make clean again doing a make run will fail. And we may need to do a make run - make clean until it seamlessly works. That is why I can't provide a defined way to recreate it...

When the error appears, we check the ports and they are free. Anyway, as said in the issue, this is not an error from EdgeX, it seems to be related to Docker/Compose. We posted it in case someone else was experiencing it (all our team members had this problem). Currently, we implement some of the solutions suggested and the error doesn't appear anymore.

lenny-goodell commented 5 months ago

@idiazcir, try running make clean to stop and remove all EdgeX containers.

I never have this issue, but still on Docker 24.0.2 on Ubuntu 20.04.6 LTS

idiazcir commented 5 months ago

We normally do a make run and then a make clean. When the error occurs we even make sure to run docker stop and docker rm on all running containers (even after the make clean). Yet, it still fails on the make run.

It was also occurring in version 24 of Docker (we just upgraded to version 25 to see if it was solved).

As said, we posted it in case more people had the same issue. We assumed that was the case as it happens to everyone on our team every so often.

lenny-goodell commented 5 months ago

Clearly something on your system, outside of EdgeX, has port 59882 in use.

Please verify you have the same issue in non-secure mode. run make run no-secty

Then, try running Core Command from command-line to eliminate Docker and see if issue still exists.

  1. Edit non-secure compose file (docker-compose-no-secty.yml) to comment out Core Command service
  2. run make run no-secty
  3. Clone edgex-go and cd to edgex-go top folder
  4. run make command
  5. Change to cmd/core-command/
  6. run ./core-command -cp -d -o

Verify if service exists due to port bind issue or stay running.

idiazcir commented 5 months ago

Thanks a lot for the suggestion! In case we ever encounter the problem again, we will try this. As I said, we changed the port bindings to avoid this error from appearing. Reverting the bindings to try to cause the error is not as easy either, as it only happens occasionally. We didn't find a way to force it.

M0hanrajp commented 1 week ago

Hello @idiazcir

I have encountered this error a few times & I have managed to fix it with the following commands :

You can try these out & check if it fixes the error.

  1. Stop the Redis Service: You can stop the Redis service managed by systemd with the following command:

    sudo systemctl stop redis
  2. Disable the Redis Service: To prevent Redis from starting automatically on boot, you can disable the service:

    sudo systemctl disable redis
  3. Verify the Service is Stopped: Check if the Redis service has stopped and is no longer listening on port 6379:

    sudo lsof -i :6379

    If no output is returned, the port is free.

Now you can continue using make run to bring up the application.

Thanks.