docker-archive / classicswarm

Swarm Classic: a container clustering system. Not to be confused with Docker Swarm which is at https://github.com/docker/swarmkit
Apache License 2.0
5.76k stars 1.08k forks source link

Feedback: `--addr` is very confusing #817

Closed aluzzardi closed 9 years ago

aluzzardi commented 9 years ago

One of the first steps to set up a cluster is to run swarm join --addr=<address> <discovery>.

Many people mistake --addr for the manager address instead of the docker address.

We should find an alternative name, perhaps:

Additionally, perhaps it would help if we'd pre-fill that option. In order to do that, we'd need to run swarm join with --net=host and grab the default interface's IP address.

Thoughts?

aluzzardi commented 9 years ago

See also #741

squillace commented 9 years ago

See also #437. :-) The general idea being that people who are not already docker-maestros should be able to read the variable names and figure out what you want here....

aluzzardi commented 9 years ago

@squillace Thanks! Re-opening that, we should fix the docs.

thaJeztah commented 9 years ago

My first idea was -H, like the regular flag to specify the docker host, but that's confusing, because swarm manage uses that. I like --docker-addr, --docker-address or perhaps --docker-host.

ghost commented 9 years ago

I like --node-addr since isn't node the official term when running info and all in the docs and presentations/slides?

aluzzardi commented 9 years ago

@UserTaken That's another thing - I'm unsure about the term node, wondering if it should be engine. Thoughts?

thaJeztah commented 9 years ago

Engine, without the "docker" context ("Docker Engine") ... not sure. Daemon might ring more bells on the command line.

ghost commented 9 years ago

I think node is fitting as it is a de facto term used in explaining distributed systems such as Consul, Etcd, Fleet, Kubernetes, cryptocurrency, etc.

thaJeztah commented 9 years ago

While I agree that node is used as the "official" term; the example, as mentioned in https://github.com/docker/swarm/issues/741 shows $ swarm join --addr=node_ip:2375 consul://consul_addr/path

The node_ip apparently wasn't enough to clarify what had to be entered;

what is the "node_ip" and what is the 2375 port? and can it be changed?

For consistency, --node-addr / --node-address sounds good, but it still needs a good explanation around that in the docs. I don't think this can be solved by only renaming the option.

abronan commented 9 years ago

I agree with @thaJeztah and @UserTaken for the node term.

In comparison:

So I would say either my-addr or node-addr and I lean towards node-addr with a good explanation in the doc ;)

squillace commented 9 years ago

I'm only one voice, but as a relative newbie to docker-things, I like the node-addr idea but as @abronan and others mention, what you are doing here is completely unclear. what IS the port number we are putting there, conceptually? Is it the open VM port to the docker daemon on that machine? Is it a random SSH port? What IS the thing?

if you documented quickly what that thing is here, you'd solve the entire issue. It's the "node" address plus a port on which the node's docker daemon is listening, yes? In which case, just expressing it that way -- along with node-addr -- should be entirely and completely sufficient even for newbies.

Or do I continue to misunderstand what that endpoint actually represents? If so, you can see the problem. :-)

abronan commented 9 years ago

It's the "node" address plus a port on which the node's docker daemon is listening, yes?

You are right :) but I agree it's not clear in the doc yet.

A quick attempt here:

Before


Register the Swarm agents to the discovery service. The node's IP must be accessible from the Swarm Manager. Use the following command and replace with the proper node_ip and cluster_id to start an agent.

docker run -d swarm join --addr=<node_ip:2375> token://<cluster_id>

After


Register the Swarm agents in the cluster:

docker run -d swarm join --node-addr=<ip:2375> token://<cluster_id>

Where ip is the ip address of the current node and 2375 is the port where the docker daemon is running on that machine (2375 is the default port on which the docker daemon is listening to client's requests, but it may vary according to your setup).

Replace cluster_id with the token you generated using the swarm create command in step [include-step-number]


Or something like this?

thaJeztah commented 9 years ago

@abronan looking good. I know I mentioned it in other places, but 2375 is the non-secure port, should the examples use 2376 secure/TLS?

squillace commented 9 years ago
  1. IMHO, you should never document insecure ports without noting the fact and pointing to the secure instructions right there.
  2. @abronan : yes, you're on the right track I think. Here's my take for your consideration:

Register the Swarm agents in the cluster:

docker run -d swarm join --node-addr=<node_ip>:<node-docker-port> token://<cluster_id>

Where node_ip is the ip address of the current node computer and node-docker-port is the port at which this node's docker daemon is listening. (Remember that 2375 is the default port on which the docker daemon listens to client requests, but it may vary according to how you set up your docker daemon. If you're trying to configure swarm for TLS-secured communication, 2376 is the default docker daemon port for that).

Replace cluster_id with the token you generated using the swarm create command in step [include-step-number]

my quick version for you.....???

squillace commented 9 years ago

Missing, still, is the stronger sense of the fact that WHATEVER port you configured docker to use on that node, THAT is the port value you need to put there. You're telling swarm that's here is this node's docker endpoint. It needs to know that. :-)

snrism commented 9 years ago

+1 to @abronan I think --node-addr fits well. In addition, it will be nice to document about default port 2375/2376 as part of the README and highlight to use the secure option.

@aluzzardi regarding pre-fill the address with --net-host, i believe we might have multiple interface choices in the host and hence might have to pick the right iface.

ghost commented 9 years ago

I agree with emphasizing security as I've noticed an abundance of unencrypted Docker ports while scanning the Internet, up to a dozen vulnerable machines on a single IP range. Most being VPS' ranging from single core and 512MB RAM to

Containers: 13
Images: 242
Storage Driver: devicemapper
 Pool Name: docker-253:2-4298645046-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 6.493 GB
 Data Space Total: 107.4 GB
 Data Space Available: 100.9 GB
 Metadata Space Used: 11.72 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.136 GB
 Udev Sync Supported: true
 Data loop file: /home/dpeterson/docker_storage/devicemapper/devicemapper/data
 Metadata loop file: /home/dpeterson/docker_storage/devicemapper/devicemapper/metadata
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Kernel Version: 3.10.0-229.1.2.el7.x86_64
Operating System: Boss
CPUs: 8
Total Memory: 62.73 GiB
ID: K6H4:AZEN:RR4F:XF4A:BTGQ:ZJQR:MEIT:VFSR:WG26:C7PD:WXLS:SZEK
squillace commented 9 years ago

@UserTaken Geebers!!

aluzzardi commented 9 years ago

Thanks for the great feedback!

/cc @moxiegirl

To sum up the conversation:

One think I might add as well, related to the port discussion: step 2 of Set up Swarm nodes says to run:

docker -H tcp://0.0.0.0:2375 -d

That is utterly confusing. It hardcodes the port with no explanation and worst of all people end up running (and failing at doing so) a second instance of docker rather than updating their system configuration to change to start up flags (we simply need them to add -H tcp://0.0.0.0:<whatever port they like>). This is tough to document since it's linux distribution specific.

moxiegirl commented 9 years ago

@aluzzardi Sure thing. A few questions. -- Must the --node-addr be an actual IP or can I use a named value that resolves like localhost or my-computer-name If the option requires an actual IP I would rename it --node-ip to be exact. -- Are IPV6 addresses supported? (I really should check the code but I'm trying to be quick.) -- In this statement "This is tough to document since it's linux distribution specific." the antecedent to it's is unclear. Do you meant the ip? the port? or the command configuration is distribution specific?

aluzzardi commented 9 years ago

Thanks @moxiegirl

  1. It can be something that resolves but it must be reachable by Swarm so localhost won't work. To put it in other words, Swarm will connect to node-addr in order to talk to the Docker Engine running on that machine. If it's localhost, then Swarm will attempt to connect to itself.
  2. I believe it does support IPv6. Under the hood we use Go's net/http package which does support IPv6 but since we've never tested with IPv6, perhaps it's broken in some way so I wouldn't advertise it unless we have tests backing that up.
  3. The command configuration. Basically, you have to change the Docker daemon startup flags. The location of those options depends on how the user installed Docker. For instance, on Debian and Ubuntu it's going to be /etc/default/docker. This issue is not really Swarm specific: the Docker docs are missing that bit of information as well: docker/docker#3630. The latest Ubuntu release (15.04) switched to systemd and the way to set Docker flags changed as well: docker/docker#12926
squillace commented 9 years ago

@aluzzardi: re: 1, above, are we saying something like, "any address form that can be reached by the swarm master that is not the swarm master's address"? I'm assuming that, although it makes little sense, you could be testing out swarm on your local machine and in that case, localhost would work? But it's unlikely to be useful in other cases....??

aluzzardi commented 9 years ago

@squillace: I was just implying we shouldn't mention localhost as an example in the docs - there are valid use cases for using localhost (I use it all the time), but I think we should just explain that the address must be reachable by Swarm and that's it (as in, not mention not the swarm master's address).

This might help: In order to check if the address is correct, the user could run docker -H <node-addr> info from the Swarm machine. If he gets the info back, then Swarm can talk to node-addr. If not, the address is wrong. @moxiegirl @squillace Thoughts?

@ehazlett @nathanleclaire: We are going to change --addr to --node-addr in swarm join. This would break machine - how should we handle this?

squillace commented 9 years ago

@aluzzardi: yes, agreed on no specific mention of localhost. agreed that a note indicating your test for the address value would be useful; everything else seems just peachy to me....

ehazlett commented 9 years ago

If the new --node-addr will be in 0.3.0 we will get it updated in master. Along with that I will get the ability to use a different swarm image so we can test the provisioning process.

Thanks for the heads up!

On Wednesday, May 20, 2015, Andrea Luzzardi notifications@github.com wrote:

@squillace https://github.com/squillace: I was just implying we shouldn't mention localhost as an example in the docs - there are valid use cases for using localhost (I use it all the time), but I think we should just explain that the address must be reachable by Swarm and that's it (as in, not mention not the swarm master's address).

This might help: In order to check if the address is correct, the user could run docker -H info from the Swarm machine. If he gets the info back, then Swarm can talk to node-addr. If not, the address is wrong. @moxiegirl https://github.com/moxiegirl @squillace https://github.com/squillace Thoughts?

@ehazlett https://github.com/ehazlett @nathanleclaire https://github.com/nathanleclaire: We are going to change --addr to --node-addr in swarm join. This would break machine - how should we handle this?

— Reply to this email directly or view it on GitHub https://github.com/docker/swarm/issues/817#issuecomment-104040402.

aluzzardi commented 9 years ago

Alternative: How about --public-addr? @squillace @thaJeztah @UserTaken @snrism

thaJeztah commented 9 years ago

Good one. Clear, descriptive.

Only worry I have is that "public" doesn't need to be "public". Haven't actually checked if this is possible yet, but multiple nodes (in the same datacenter) will probably use the internal/private network.

squillace commented 9 years ago

I'm a cloud guy: node means "this machine, here". That could easily be in a private cluster, accessible through no public interface directly, as @thaJeztah suggests. At least, I believe so. The result is that --public-addr sounds like it's slightly wrong. Public to the VM, sure, but.... am I off base? I could easily be....

thaJeztah commented 9 years ago

Confirmed. Toughest thing in development is naming things. Just call it --opt-817

aluzzardi commented 9 years ago

Okay so, let's keep it --node-addr.

One last thought:

From our docs:

docker run -d -p <swarm_port>:2375 swarm manage token://<cluster_id>

How about this instead:

docker run --net=host -d swarm manage token://<cluster_id>

Then explain that Swarm will listen by default to 2375 and that can be changed (in the same way as Docker) with:

docker run --net=host -d swarm manage -H 0.0.0.0:<whateverport> token://<cluster_id>

Is it more obvious?

ghost commented 9 years ago

2375/2376 are the default ports for Docker daemon so shouldn't the Swarm master default to 3375/3376 with --net host?

thaJeztah commented 9 years ago

Afaik, we discourage using --net=host out of security considerations. Using it purely for convenience doesn't sound right, especially when targeting "newbies" that might not be aware of that. See https://docs.docker.com/reference/run/#mode-host

squillace commented 9 years ago

@aluzzardi +1 for @thaJeztah. I feel.... qualified as a newbieish person... to say that I prefer explicit in basic documentation to "this easy shorthand", whether it's --net=host or anything else. I'm likely to set up the swarm my own way; if the docs do anything that is default, I'm hosed if I've never used the default or the shorthand. And unhosing unknown defaults is kinda hard sometimes.

And after reading the host mode doc, I'd say no, please, let's skip that idea. Explicit about addresses and ports is not hard for newbies; stating the defaults to use them if you want is not hard for newbies. Give them the happy path to professional success. :-)

aluzzardi commented 9 years ago

Another day another flag.

I just realized Consul uses --advertise - what do you think of that?

aluzzardi commented 9 years ago

I think I'm going to go with --advertise - it kinda speaks to me.

Thoughts? @thaJeztah @squillace @UserTaken

abronan commented 9 years ago

@aluzzardi +1 I was torn between using it for Leader Election or for regular machines joining the cluster, but it makes more sense for swarm join I guess. You advertise your address through the discovery service. Makes sense to me.

squillace commented 9 years ago

Can I buy you a strong Long Island Iced Tea? I'm questioning your judgment. :-)

so, the command is something like,

docker run -d swarm join --advertise=<node_ip>:<node-docker-port> token://<cluster_id>

??? So, the command says, "have this node ADVERTISE it's docker endpoint to the swarm with <cluster-id>"?

squillace commented 9 years ago

that sounds redundant to me, looking at it. The join command says, "JOIN", not "please may I join if you notice that I'm here?" which was what "advertise" feels like. Why isn't the command swarm advertise <ip>:<port> then? Why not ditch the whole named argument issue?

BTW: at this point, I am just responding honestly. I think really so long as you dock it properly, you can use --opt-817 as @UserTaken has already suggested. :-) I am, however, impressed you're thinking about it so intently.

aluzzardi commented 9 years ago

Can I buy you a strong Long Island Iced Tea? I'm questioning your judgment. :-)

Please :-)

that sounds redundant to me, looking at it. The join command says, "JOIN", not "please may I join if you notice that I'm here?" which was what "advertise" feels like.

For me, it means join this cluster and --advertise this as my address. The flag doesn't control if join should advertise, it sets what it should advertise.

Why isn't the command swarm advertise : then? Why not ditch the whole named argument issue?

Because: 1) It feels strange to pass both token://foo <ip:port> to join (or advertise) 2) --advertise can be optional. Consul uses the bind address if --advertise is not provided 3) There are some ways to set sensible defaults for the address to advertise (we could look at the network interfaces, we could change docker to return the address, or some stores such as Consul can actually tell you the address you are communicating with). Settings that as an arg rather than a flag removes the sensible default option from the table.

BTW: at this point, I am just responding honestly.

I appreciate that :)

I think really so long as you dock it properly, you can use --opt-817 as @UserTaken has already suggested. :-) I am, however, impressed you're thinking about it so intently.

I've seen so many people get --addr wrong and spend a long time debugging the issue. Those people include @samalba and @jpetazzo who have been using Docker since before it was even called Docker, so it's an alarming issue.

The project can only be as good as its getting started process.

squillace commented 9 years ago

@aluzzardi OK, this we can work with. So, if we were being honest, we'd say something like, swarm join --advertisement-address=<ipOrDnsNameForVM>:<dockerPort> token://<clusterId>.

How does that feel? swarm join is the command. the <ip>:<port> portion is the advertised, announced, publicized, or published location of the node computer and it's docker daemon listening port. Advertised seems to me the least of the problem. I think if you have : there, then you really can use any of these names. It was the values that ; represented and could be that threw me for the week I struggled with this. as a result, I would argue for something like this, because it's not the --advertise part that was strange for me, it was the values of the address and of the port. So:

docker run -d swarm join --advertise=<node_ip_or_addr>:<node-docker-port> token://<cluster_id>

Give that a try, if you like advertise. How does that feel?

aluzzardi commented 9 years ago

How's that different from --node-addr or even --addr?

squillace commented 9 years ago

@aluzzardi bingo, I would say. From my newbie point of view, the problem for me -- and so therefore your mileage may vary -- was that the argument possibilities weren't clear. --addr said to me, "ok, this is supposed to be an address... but of what?". So it's the "of what" part that needs to be clear in my view. "advertise" doesn't change that one iota, to me.

Certainly, I thought that --node-addr was better, in that it sorta made it clear what the address was supposed to be for. But I still would never have guessed -- as others didn't -- whether I could change the address, what port I was specifying there -- was it the docker port, or was that some swarm port I just missed in the docs somewhere? -- that kind of thing.

The result is for me that, whatever you do in this name, make sure the following are clear:

  1. The ip OR resolveable dns name (or "localhost") of the docker host's computer are the obvious first portion of that address.
  2. That the port portion of that address is the port on which the docker daemon on that computer is listening, that it can be whatever port you configured your docker host to use (though the default is 2375 and the secured default is 3370 or whatever).

If you do those things, it'll be obvious to the naive user that what swarm join does is take a docker host's address and docker port and sends that to the discovery service along with the cluster id that the docker host would like to be a part of. You know that, then you completely understand this command and what the parameters must be.

That is the entire logical flow that I missed when we went through this, not knowing enough as I did then. I'm good with --advertise in the abstract, but it doesn't address the heart of the misunderstanding for newbies. I don't care whether it's really "joining" or whether it's merely "advertising it's ability to join" -- I just want to know what I put there, and what I CAN put there. :-)

And by the way, I sure hope that the feedback helps, even if you decide to go a different direction.

aluzzardi commented 9 years ago

Does that help?

$ swarm join --help | grep advertise
   --advertise, --addr  Address of the Docker Engine joining the cluster. Swarm managers MUST be able to reach Docker at this address. [$SWARM_ADVERTISE]
jpetazzo commented 9 years ago

Reacts to random GH ping

Much ♥ to --advertise, or --my-addr, or --this-node-addr ... :-)

Also, inferring the value automatically would be super nice, but risk-prone:

¯_(ツ)_/¯

squillace commented 9 years ago

@aluzzardi YES, that is absolutely fine. Go for it. IMHO, that was nexus of the problem. :-)

@jpetazzo: yes, agreed. I have swarms crossing subnets, clouds, and flowing onto the premise, too. Let ME tell swarm what that address should be. :-)

chanwit commented 9 years ago

Actually I dont have any problem as we are doing the whole setting up thing via docker-machine, but let me be able to tell it too.

aluzzardi commented 9 years ago

Fixed in #858