Open ibaldin opened 7 years ago
Some notes on the Docker aspects, without fully understanding the Handlers or ImageProxy :)
- Downloads and installs a Docker or similar container image and starts the container.
ubuntu
ubuntu:17.04
spotify/cassandra
spotify/cassandra:cluster
And then Orca can do a docker pull
on that image name to download the image.docker pull
should be able to download the image with the simple URL, e.g. docker pull myregistry.local:5000/testing/test-image
docker save
to copy the image to a tar file. This use case is closest to the current ImageProxy model. Orca would need to download the tar file, and then load it into docker using docker load
.
Installs user keys
- We could potentially require the user to pre-populate their docker container with any keys they need.
- Orca can install the keys on the Host (vm or baremetal), and then either volume mount or cp the keys to make them available to the container.
The closest thing to current behavior would probably be to copy the keys to a default location (for root user SSH).
We could give the user the option of declining to have keys copied into the container, if the image doesn't need or already provides them.
Or we could give the user the option of specifying the container path where keys will get copied to.
- Port numbers will need to be managed somehow, since presumably we will need administrator SSH access to the Host. Docker allows exposed ports to be mapped between the host and container with the
publish
option ofdocker run
. (So an SSH server in the container, nominally running at port 22 inside the container could be accessed using a different Host port.)stops the container
We will probably want to name
the containers (probably be UUID?) when doing the docker run
, so that they can be explicitly stop
ed and rm
ed when desired.
We'll probably need to have support for user-specified docker run
options.
publish
-- docker won't allow network access to the container unless the ports are explicitly exposed when doing the docker run
. device
-- This may not need to be explicitly set by the user, but if GPU access is required, this will need to be specified in docker run
.Several other docker run
options are commonly used, but we could in theory force the user to specify these in the container image (dockerfile) instead of runtime options:
Sounds like additional things to worry about:
Open ports (need to be part of the configuration/request somehow)
How should we deal with interactive logins? Should dockers be required to have their own SSH running on a different port, or should we manipulate the login system on the host node to start a docker-enclosed shell instance upon login into host VM ssh for the user?
BTW, not a fan of pre-populating dockers with user SSH keys - bad security practice (even if those are public)
@hinchliff : my preference to pass anything (including credentials) into the container would be either via a wrapper through the ENTRYPOINT (e.g., import credentials from ephemeral nodes in Zookeeper) or use the -e option. We used the latter in the RADII/HydroShare integration. For a more production environment I favor the former. I presume that the container would be checkout from the VM through post-boot script in the provisioning phase.
Regarding managing ports we should probably talk to Mike S who has done this. He is an expert in containers. I'll setup a meeting with him.
Should dockers be required to have their own SSH running on a different port
The containers can (internally) run SSH (or any service) on any port, and Docker will give the host the ability to map those container ports to different host port numbers: --publish
or -p
.
e.g.
$ docker run -p 127.0.0.1:80:8080 ubuntu bash
This binds port 8080 of the container to port 80 on 127.0.0.1 of the host machine.
Managing that might get complicated, so talking with someone who has already done it would be useful.
That's why I'm wondering if we'd rather create restricted shells for users that use host's SSHd to log them into bash running inside their container.
@hinchliff : this approach is to my knowledge widely adopted and is what we used for our RADII/HydroShare prototype. I have emailed Mike but he has not responded yet.
Regarding your comment "managing that might get complicated". What is "that"?
'That' is managing any port mappings. Orca will need to track that user requested ports p1, p2... pn for their container, and we gave them corresponding ports q1, q2... qn on the host. Might get more complicated when you start to allow more than one container on each host.
We could use the feature that maps a container port to a range of ports in the host, e.g., "docker run -d -p 7000-8000:4000 myApp" This would bind port 4000 in the container to a random port between 7000 and 8000 on the host, depending upon the port that is available in the host at that time. We could then report back the allocated port to ORCA directly or through some side-channel mechanism (Zookeeper). Source: https://bobcares.com/blog/docker-port-expose/ Would that work?
Interesting. I guess you can also just let Docker completely pick the ports:
To randomly map any network port inside a container to a port in the Docker host, the ‘-P’ option can be used in ‘docker run’:
docker run -d -P webapp
To see the port mapping for this container, you can use ‘docker ps’ command after creating the container.
Makes it a little bit more of a trick to find the port mapping to report back to the user, but less management.
docker exec -i -t <guid> /bin/bash
attaches a shell to docker. We may need a dynamic wrapper script to do this, so this can be added to /etc/passwd for a new user entry, but the end result is the same - user upon login using host's SSH (no need to package SSH into docker, only bash) ends up inside the docker environment.
So the handler on join needs to add user (update /etc/passwd file and create home directory) and push user key into /home/user/.ssh/ on host VM. On leave, it is simply deleting the user (userdel) together with the home directory.
No need to manage ports at all. The caveat is disallowing root login for users (since it is root@host VM), but this is now a standard GENI practice - most tools expect to use username@host, not root@host. Multiple user logins should be allowable - a matter of creating multiple /etc/passwd entries mapped to the same container (and remembering to delete them).
@ibaldin : I used docker exec to run commands. For instance, I have used docker exec to run icommands in a container configured as an iRODS client node. I still think some port configuration is needed for a long-running application-container, e.g., iRODS resource node and HTCondor. We will meet with Mike and further discuss our use case.
no need to package SSH into docker, only bash
@ibaldin - Agreed that handing off to docker exec -u <YOUR_USER> -ti <GUID> /bin/bash
is the way to go.
Do note that if you package SSH into a docker container where the host also uses SSH, you'll need to remap the SSH port from the host's point of view to the container. i.e. using port 2022 instead of 22.
You could also allow SSH to be run by a non-root user within the container so if the user does log in via SSH they would not be root on the container.
Example: In the docker-entrypoint script you'd want to define that SSH is owned by a non-root user and run on a port other than 22.
...
chown -R <YOUR_USER>:<YOUR_GROUP> /etc/ssh
sed -i "/\<UsePrivilegeSeparation\>/c\UsePrivilegeSeparation no" /etc/ssh/sshd_config
sed -i "/\<Port 22\>/c\Port 2022" /etc/ssh/sshd_config
runuser -p -u <YOUR_USER> -g <YOUR_GROUP> /usr/sbin/sshd
...
@mjstealey : I will take some of our time on Friday to ask you more about this. I don't understand how this approach alone addresses the deployment of a long running service such as a HTCondor Worker which maintains a long-lived connection with a HTCondor Master (an overlay) and may open new connections to other nodes. It seems to me that port management has to be handled by something either outside Docker or inside. I think it is possible that there are different application models, e.g., ssh vs distributed service that have different requirements and need different configurations. Or perhaps above addresses all of them and I don't see it yet.
Makes it a little bit more of a trick to find the port mapping to report back to the user, but less management.
@hinchliff - using docker inspect <CONTAINER_ID>
will return a whole bunch of configuration information as JSON.
So, for a process that uses lots of ports like iRODS, you could do something like this to get the Networking.Ports
information
$ docker run -d -p 1247:1247 -p 1248:1248 -p 20000-20199:20000-20199 --name irods4.2 mjstealey/irods-provider-postgres:latest
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
34f6a996a3f2 mjstealey/irods-provider-postgres:latest "/irods-docker-ent..." 26 seconds ago Up 22 seconds 0.0.0.0:1247-1248->1247-1248/tcp, 0.0.0.0:20000-20199->20000-20199/tcp, 5432/tcp irods4.2
$ docker inspect --format=" {{ .NetworkSettings.Ports }} " irods4.2
map[20117/tcp:[{0.0.0.0 20117}] 20144/tcp:[{0.0.0.0 20144}] 20024/tcp:[{0.0.0.0 20024}] 20062/tcp:[{0.0.0.0 20062}] 20112/tcp:[{0.0.0.0 20112}] 20083/tcp:[{0.0.0.0 20083}] 20119/tcp:[{0.0.0.0 20119}] 20152/tcp:[{0.0.0.0 20152}] 20158/tcp:[{0.0.0.0 20158}] 20190/tcp:[{0.0.0.0 20190}] 20012/tcp:[{0.0.0.0 20012}] 20021/tcp:[{0.0.0.0 20021}] 20030/tcp:[{0.0.0.0 20030}] 20105/tcp:[{0.0.0.0 20105}] 20106/tcp:[{0.0.0.0 20106}] 20140/tcp:[{0.0.0.0 20140}] 20148/tcp:[{0.0.0.0 20148}] 20009/tcp:[{0.0.0.0 20009}] 20074/tcp:[{0.0.0.0 20074}] 20076/tcp:[{0.0.0.0 20076}] 20065/tcp:[{0.0.0.0 20065}] 20071/tcp:[{0.0.0.0 20071}] 20084/tcp:[{0.0.0.0 20084}] 20099/tcp:[{0.0.0.0 20099}] 20108/tcp:[{0.0.0.0 20108}] 20038/tcp:[{0.0.0.0 20038}] 20049/tcp:[{0.0.0.0 20049}] 20063/tcp:[{0.0.0.0 20063}] 20122/tcp:[{0.0.0.0 20122}] 20150/tcp:[{0.0.0.0 20150}] 20178/tcp:[{0.0.0.0 20178}] 20193/tcp:[{0.0.0.0 20193}] 20032/tcp:[{0.0.0.0 20032}] 20043/tcp:[{0.0.0.0 20043}] 20123/tcp:[{0.0.0.0 20123}] 20100/tcp:[{0.0.0.0 20100}] 20104/tcp:[{0.0.0.0 20104}] 20118/tcp:[{0.0.0.0 20118}] 20128/tcp:[{0.0.0.0 20128}] 20135/tcp:[{0.0.0.0 20135}] 20013/tcp:[{0.0.0.0 20013}] 20068/tcp:[{0.0.0.0 20068}] 20080/tcp:[{0.0.0.0 20080}] 20067/tcp:[{0.0.0.0 20067}] 20091/tcp:[{0.0.0.0 20091}] 20172/tcp:[{0.0.0.0 20172}] 1247/tcp:[{0.0.0.0 1247}] 1248/tcp:[{0.0.0.0 1248}] 20040/tcp:[{0.0.0.0 20040}] 20109/tcp:[{0.0.0.0 20109}] 20147/tcp:[{0.0.0.0 20147}] 20166/tcp:[{0.0.0.0 20166}] 20184/tcp:[{0.0.0.0 20184}] 20186/tcp:[{0.0.0.0 20186}] 20019/tcp:[{0.0.0.0 20019}] 20051/tcp:[{0.0.0.0 20051}] 20078/tcp:[{0.0.0.0 20078}] 20058/tcp:[{0.0.0.0 20058}] 20060/tcp:[{0.0.0.0 20060}] 20061/tcp:[{0.0.0.0 20061}] 20086/tcp:[{0.0.0.0 20086}] 20121/tcp:[{0.0.0.0 20121}] 20031/tcp:[{0.0.0.0 20031}] 20041/tcp:[{0.0.0.0 20041}] 20055/tcp:[{0.0.0.0 20055}] 20127/tcp:[{0.0.0.0 20127}] 20145/tcp:[{0.0.0.0 20145}] 20160/tcp:[{0.0.0.0 20160}] 20136/tcp:[{0.0.0.0 20136}] 20164/tcp:[{0.0.0.0 20164}] 20168/tcp:[{0.0.0.0 20168}] 20170/tcp:[{0.0.0.0 20170}] 20174/tcp:[{0.0.0.0 20174}] 20014/tcp:[{0.0.0.0 20014}] 20037/tcp:[{0.0.0.0 20037}] 20134/tcp:[{0.0.0.0 20134}] 20196/tcp:[{0.0.0.0 20196}] 20114/tcp:[{0.0.0.0 20114}] 20133/tcp:[{0.0.0.0 20133}] 20010/tcp:[{0.0.0.0 20010}] 20011/tcp:[{0.0.0.0 20011}] 20131/tcp:[{0.0.0.0 20131}] 20094/tcp:[{0.0.0.0 20094}] 20113/tcp:[{0.0.0.0 20113}] 20120/tcp:[{0.0.0.0 20120}] 20154/tcp:[{0.0.0.0 20154}] 20165/tcp:[{0.0.0.0 20165}] 20002/tcp:[{0.0.0.0 20002}] 20039/tcp:[{0.0.0.0 20039}] 20085/tcp:[{0.0.0.0 20085}] 20056/tcp:[{0.0.0.0 20056}] 20079/tcp:[{0.0.0.0 20079}] 20107/tcp:[{0.0.0.0 20107}] 20111/tcp:[{0.0.0.0 20111}] 20000/tcp:[{0.0.0.0 20000}] 20006/tcp:[{0.0.0.0 20006}] 20034/tcp:[{0.0.0.0 20034}] 20191/tcp:[{0.0.0.0 20191}] 20015/tcp:[{0.0.0.0 20015}] 20023/tcp:[{0.0.0.0 20023}] 20177/tcp:[{0.0.0.0 20177}] 20088/tcp:[{0.0.0.0 20088}] 20137/tcp:[{0.0.0.0 20137}] 20169/tcp:[{0.0.0.0 20169}] 20185/tcp:[{0.0.0.0 20185}] 20045/tcp:[{0.0.0.0 20045}] 20052/tcp:[{0.0.0.0 20052}] 20057/tcp:[{0.0.0.0 20057}] 20189/tcp:[{0.0.0.0 20189}] 20097/tcp:[{0.0.0.0 20097}] 20098/tcp:[{0.0.0.0 20098}] 20130/tcp:[{0.0.0.0 20130}] 20171/tcp:[{0.0.0.0 20171}] 20077/tcp:[{0.0.0.0 20077}] 20095/tcp:[{0.0.0.0 20095}] 20096/tcp:[{0.0.0.0 20096}] 20151/tcp:[{0.0.0.0 20151}] 20162/tcp:[{0.0.0.0 20162}] 20020/tcp:[{0.0.0.0 20020}] 20028/tcp:[{0.0.0.0 20028}] 20138/tcp:[{0.0.0.0 20138}] 20035/tcp:[{0.0.0.0 20035}] 20142/tcp:[{0.0.0.0 20142}] 20176/tcp:[{0.0.0.0 20176}] 20101/tcp:[{0.0.0.0 20101}] 20132/tcp:[{0.0.0.0 20132}] 20143/tcp:[{0.0.0.0 20143}] 20167/tcp:[{0.0.0.0 20167}] 20195/tcp:[{0.0.0.0 20195}] 20066/tcp:[{0.0.0.0 20066}] 20069/tcp:[{0.0.0.0 20069}] 20092/tcp:[{0.0.0.0 20092}] 20081/tcp:[{0.0.0.0 20081}] 20008/tcp:[{0.0.0.0 20008}] 20027/tcp:[{0.0.0.0 20027}] 20044/tcp:[{0.0.0.0 20044}] 20153/tcp:[{0.0.0.0 20153}] 20198/tcp:[{0.0.0.0 20198}] 20046/tcp:[{0.0.0.0 20046}] 20053/tcp:[{0.0.0.0 20053}] 20073/tcp:[{0.0.0.0 20073}] 20188/tcp:[{0.0.0.0 20188}] 20146/tcp:[{0.0.0.0 20146}] 20155/tcp:[{0.0.0.0 20155}] 20163/tcp:[{0.0.0.0 20163}] 20087/tcp:[{0.0.0.0 20087}] 20093/tcp:[{0.0.0.0 20093}] 20115/tcp:[{0.0.0.0 20115}] 20157/tcp:[{0.0.0.0 20157}] 20175/tcp:[{0.0.0.0 20175}] 20025/tcp:[{0.0.0.0 20025}] 20036/tcp:[{0.0.0.0 20036}] 20075/tcp:[{0.0.0.0 20075}] 20182/tcp:[{0.0.0.0 20182}] 20156/tcp:[{0.0.0.0 20156}] 20173/tcp:[{0.0.0.0 20173}] 20197/tcp:[{0.0.0.0 20197}] 20029/tcp:[{0.0.0.0 20029}] 20033/tcp:[{0.0.0.0 20033}] 20149/tcp:[{0.0.0.0 20149}] 20072/tcp:[{0.0.0.0 20072}] 20129/tcp:[{0.0.0.0 20129}] 20159/tcp:[{0.0.0.0 20159}] 20180/tcp:[{0.0.0.0 20180}] 20183/tcp:[{0.0.0.0 20183}] 20018/tcp:[{0.0.0.0 20018}] 20042/tcp:[{0.0.0.0 20042}] 20047/tcp:[{0.0.0.0 20047}] 20102/tcp:[{0.0.0.0 20102}] 20141/tcp:[{0.0.0.0 20141}] 20181/tcp:[{0.0.0.0 20181}] 20187/tcp:[{0.0.0.0 20187}] 20054/tcp:[{0.0.0.0 20054}] 20059/tcp:[{0.0.0.0 20059}] 20070/tcp:[{0.0.0.0 20070}] 20048/tcp:[{0.0.0.0 20048}] 20001/tcp:[{0.0.0.0 20001}] 20016/tcp:[{0.0.0.0 20016}] 20017/tcp:[{0.0.0.0 20017}] 20082/tcp:[{0.0.0.0 20082}] 20089/tcp:[{0.0.0.0 20089}] 20110/tcp:[{0.0.0.0 20110}] 20179/tcp:[{0.0.0.0 20179}] 20192/tcp:[{0.0.0.0 20192}] 20003/tcp:[{0.0.0.0 20003}] 20007/tcp:[{0.0.0.0 20007}] 20064/tcp:[{0.0.0.0 20064}] 5432/tcp:[] 20103/tcp:[{0.0.0.0 20103}] 20124/tcp:[{0.0.0.0 20124}] 20139/tcp:[{0.0.0.0 20139}] 20161/tcp:[{0.0.0.0 20161}] 20199/tcp:[{0.0.0.0 20199}] 20004/tcp:[{0.0.0.0 20004}] 20026/tcp:[{0.0.0.0 20026}] 20090/tcp:[{0.0.0.0 20090}] 20116/tcp:[{0.0.0.0 20116}] 20125/tcp:[{0.0.0.0 20125}] 20126/tcp:[{0.0.0.0 20126}] 20194/tcp:[{0.0.0.0 20194}] 20005/tcp:[{0.0.0.0 20005}] 20022/tcp:[{0.0.0.0 20022}] 20050/tcp:[{0.0.0.0 20050}]]
I don't understand how this approach alone addresses the deployment of a long running service such as a HTCondor Worker which maintains a long-lived connection with a HTCondor Master
@clarisca - It doesn't. Rather it was a comment on being able to attach to a container as a non-root user if such a thing is desired. Unsure if there will be a maintainer concept that would need access to a container within a node once it's instantiated.
By default docker runs as root, and any operations performed within the container would be as root unless explicitly specified otherwise. So the options are.
docker exec
to a non-root predefined user within the containerHad a meeting with Alan. He will think about the various design aspects, I'll do the flowchart with tasks. We may ask Mert help try the approach manually to make sure things work as expected.
I started a wiki page on the design aspects
@vjorlikowski suggests looking at https://github.com/jpetazzo/pipework PipeWork to constrain containers to specific network interfaces.
This is proposed by @clarisca as part of SciDAS - support for creating ORCA AMs that operate on a pool of pre-provisioned VMs of potentially varying sizes. There are several issues that must be considered
I propose doing this in two phases - Phase I involves only modifying ImageProxy and creating a new handler for issuing VMs. Phase II would add support for RDF specification of these AMs along with extensions to processing RDF and doing proper delegation of resources. The latter would also require controller extensions, as well as AM and broker controls.
The assumption in both cases that such AM operates only using commodity internet. Not clear how dynamic connectivity may work here. I suppose there could be a pool of VLANs available, but then not clear how to connect VMs to them dynamically. They can be plumbed statically upon creation, at the expense of security and performance isolation. In the former case no Net AM is needed, in the latter a traditional Net AM should work just fine.
Phase I:
On leave:
Phase II:
@paul-ruth @anriban
@vjorlikowski is it possible to leverage xCAT and xCAT machinery for issuing nodes and IP addresses for this?
Comments welcome.