Open statickidz opened 1 month ago
hmm I think something is needed at the traefik level to make it able to route to the worker container.
hmm I think something is needed at the traefik level to make it able to route to the worker container.
Is this something related to my environment or you were able to make it work before?
@statickidz Yes, this already worked for me some time ago, however since we upgraded traefik to version 3 I haven't tried it, surely there was some change.
I recently tested and is working for me used this docker image
I don't have any running container in the dokploy server
In the worker is running 6 instances
The domain I've used
and when you enter you will see this
If you reload after a couple minutes the information should change since is using another private ip and everything, so the load balancing working fine
@Siumauricio I see! I just created a new Dokploy instances (manager and worker) in AWS to check if it was something related with OCI but I'm getting the same result, that's quite weird. As before, all ports opened, no issues joining the Swarm cluster but when the request leads to the worker I get the Gateway Timeout. At this point I'm not sure what could be.
Did you make a custom installation? or did you installed with the official script?
Did you make a custom installation? or did you installed with the official script?
For the main instance official script, for the workers the commands provided on the "Add Node" button.
https://github.com/statickidz/dokploy-oci-free/blob/main/bin/dokploy-main.sh https://github.com/statickidz/dokploy-oci-free/blob/main/bin/dokploy-worker.sh
Have you check in the dashboard of dokploy if you have the worker associated in the cluster section?
I see you are exiting docker swarm in the worker, then how did you link the worker to the manager, you follow the steps from the Add Node button manually?
I would recommend you first try using the traditional way that dokploy gives, that is linking the workers manually, if you see that it works, I think it would be a problem of your infrastructure setup.
Is your infrastructure running on Oracle OCI? I encountered the same problem, but it runs normally if executed on the same node where Traefik is located.
Have you check in the dashboard of dokploy if you have the worker associated in the cluster section?
Yep, it's been displayed correctly
I see you are exiting docker swarm in the worker, then how did you link the worker to the manager, you follow the steps from the Add Node button manually?
I would recommend you first try using the traditional way that dokploy gives, that is linking the workers manually, if you see that it works, I think it would be a problem of your infrastructure setup.
Same result either if I pre-install docker and I pre-leave swarm (like in the script) or if I take the Dokploy quick steps to install it.
For example, this is the last test on a fresh worker node with the dokploy steps, result is always Gateway Timeout:
@Siumauricio this is a test environment so if you feel you want to debug that in deep reach me, I can provide you the access to the instances
Is your infrastructure running on Oracle OCI? I encountered the same problem, but it runs normally if executed on the same node where Traefik is located.
Found it on the Oracle OCI, works well if I point all the instances to the manager with this like you say
But I feel this is not OCI related, because I created a couple of instances on AWS to try and the result was the same https://github.com/Dokploy/dokploy/issues/592#issuecomment-2447020784
But I feel this is not OCI related, because I created a couple of instances on AWS to try and the result was the same
I just try it on my azure server and the same issue occurd.
@Siumauricio Can we try load balance of traefik like
[tcp.services]
[tcp.services.app]
[[tcp.services.app.weighted.services]]
name = "appv1"
weight = 3
[[tcp.services.app.weighted.services]]
name = "appv2"
weight = 1
[tcp.services.appv1]
[tcp.services.appv1.loadBalancer]
[[tcp.services.appv1.loadBalancer.servers]]
address = "private-ip-server-1/:8080"
[tcp.services.appv2]
[tcp.services.appv2.loadBalancer]
[[tcp.services.appv2.loadBalancer.servers]]
address = "private-ip-server-2/:8080"
instead of pointing them directly to the service itself like
services:
animeapi-core-409c00-service-11:
loadBalancer:
servers:
- url: http://animeapi-core-409c00:8000
To Reproduce
Create a Dokploy simply Docker Swarm configuration with 1 manager and 1 worker.
Create an app with https://github.com/Dokploy/swarm-test
Put more than 1 replica in the Swarm config
Verify all deployed replicas are splitting well in the two instances
Manager
Worker
Current vs. Expected behavior
I expect all the Docker Swarm containers work normally independently where the request goes both on manager and worker instances but it seems like when the request goes to worker instance I get Gateway Timeout, otherwise if it goes to manager works.
Provide environment information
Which area(s) are affected? (Select all that apply)
Application, Docker Compose, Traefik, Docker
Additional context
To check that it's not a network issue between instances or something I created a rule to open all the ports in the security list, by the way I'm using this project to boot the instances: https://github.com/statickidz/dokploy-oci-free/