[Question] How to scale up the stack, can I use docker service scale?

danieljfarrell commented 4 years ago

I have successfully got the stack running on a swarm of 2 Linodes and deployed to staging domain stag.example.com.

Running the command below shows that the various services in the stack stag-example-com have been deployed to different nodes in the swarm,

> docker stack ps stag-example-com
ID                  NAME                                IMAGE                                      NODE                    DESIRED STATE       CURRENT STATE           ERROR               PORTS
zjmpa3qyvtf9        stag-example-com_celeryworker.1     celeryworker:stag                          worker1.example.com     Running             Running 7 minutes ago                       
ych5hat6x4wx        stag-example-com_backend.1          backend:stag                               leader.example.com      Running             Running 7 minutes ago                       
uic3vqy2jly1        stag-example-com_queue.1            rabbitmq:3                                 worker1.example.com     Running             Running 7 minutes ago                       
w66qpin8heyz        stag-example-com_proxy.1            traefik:v2.2                               leader.example.com      Running             Running 7 minutes ago                       
pyygqylsy6ad        stag-example-com_pgadmin.1          dpage/pgadmin4:latest                      worker1.example.com     Running             Running 7 minutes ago                       
o4o84ghdevfp        stag-example-com_frontend.1         frontend:stag                              leader.example.com      Running             Running 7 minutes ago                       
gvo5u3cdyfcj        stag-example-com_flower.1           mher/flower:latest                         worker1.example.com     Running             Running 7 minutes ago                       
y7ta4ekqedk9        stag-example-com_db.1               timescale/timescaledb-postgis:1.7.4-pg12   worker1.example.com     Running             Running 7 minutes ago

~~I believe this is because of docker-auto-labels part of the deploy.sh script which assigns services to random nodes.~~

From what I can tell docker-auto-labels part of the deploy.sh script just added a single label which constrains the database volume stag-example-com_app-db-data to always be on the same node as the database service stag-example-com_db; that's all. So the question is still open, how does this scale?

OK, docker-auto-labels is a bit of red herring.

Normally when I want to scale up a service I would,

add new Linode to the swarm
run docker service <name> scale=3

~~However, this seems not to be possible with this approach because stag-example-com are already distributed across the stack.~~

Is the intended way of scaling up the deployed stack?

danieljfarrell commented 4 years ago

I tried deploying a new staging backend (i.e. the REST API),

docker service scale stag-example-com_backend=2

I then started to log this service,

docker service logs stag-example-com_backend -f

I then went to stag.example.com and entered incorrect username and password as a way of making API calls. I could see the REST API call is being load balanced between both services!

stag-example-com_backend.2.muhic0ajviqa@worker1.example.com    | 10.0.4.13:35360 - "POST /api/v1/login/access-token HTTP/1.1" 400
stag-example-com_backend.1.ych5hat6x4wx@leader1.example.com    | 10.0.4.13:40466 - "POST /api/v1/login/access-token HTTP/1.1" 400
stag-example-com_backend.2.muhic0ajviqa@worker1.example.com    | 10.0.4.13:35360 - "POST /api/v1/login/access-token HTTP/1.1" 400
stag-example-com_backend.1.ych5hat6x4wx@leader1.example.com    | 10.0.4.13:40466 - "POST /api/v1/login/access-token HTTP/1.1" 400

However, this is not completely correct because I still only have one database running! Both REST APIs are hitting the same database. This will break if multiple users start using the system.

If I now start up a second database

docker service scale stag-example-com_db=2

Because of the node labels it is run on the same node!

ID                  NAME                                IMAGE                                      NODE                    DESIRED STATE       CURRENT STATE                ERROR               PORTS
y7ta4ekqedk9        stag-example-com_db.1               timescale/timescaledb-postgis:1.7.4-pg12   worker1.example.com     Running             Running 11 hours ago                             
21d7unmbccph        stag-example-com_db.2               timescale/timescaledb-postgis:1.7.4-pg12   worker1.example.com     Running             Running about a minute ago

So this is definitely the wrong approach.

Very interested to hear comment from anyone who has investigated this

haviduck commented 3 years ago

i added the label to the worker Just now and it scaled up using that. stackname .app-db-data=true

however were gonna have to setup postgres clustering properly to get it working.

im playing around with it now with streaming worker replicas and db as primary. if i get it running ill share the yaml :)

danieljfarrell commented 3 years ago

Yes please do!

haviduck commented 3 years ago

right, that approach is really unstable. the way replicas behave now makes me run away from anything other than stateless scaling. overlay networks doesnt detect service dnss on replicas unless they are added via cli, docker run and the likes. however, adding a streaming slave on any other server, service, baremetal or toaster Will work like a charm. so thats the route we are working on now. have you been able to successfully replicate the celeryworker on a workernode?

haviduck commented 3 years ago

ima eat some of those words :P i cant think of anything else than that yml that you need to run it. Here ya go: compose.yml

danieljfarrell commented 3 years ago

Good job.

So I see

that you have constrained the database to manager node and configured it for duplicated replication
Added a new service called replica which is a database replica.
Removes app storage volume
Added two volumes for database storage for replica and master databases.

Some questions:

How is database assess load balanced? I mean, if the backend always inserts into the same database that won’t really help with scaling and you just increase your storage demands. Maybe Postgres handles this intelligently?
Can you scale dynamically? Or is this restricted to just a leader and one replica database? It seems that the replica volume is a single volume shared by all replicas. So this would indicate its limited to a single replica database.

danieljfarrell commented 3 years ago

I think I understand a bit better; replicas are readonly, so they share the same volume, https://info.crunchydata.com/blog/an-easy-recipe-for-creating-a-postgresql-cluster-with-docker-swarm this was also useful to read https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling

haviduck commented 3 years ago

I think I understand a bit better; replicas are readonly, so they share the same volume, https://info.crunchydata.com/blog/an-easy-recipe-for-creating-a-postgresql-cluster-with-docker-swarm this was also useful to read https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling

yep. readonly with hot standby. im using crunchs great example to get it working (i have changed the original script i shared, had some errors). have a look at their docs it has some pretty cool params

when everything is tip top im changing to: this

streaming replication is pretty cool, but im going for multi-master with ha and standby . ill share my findings :)

fastapi / full-stack-fastapi-template

[Question] How to scale up the stack, can I use docker service scale? #264