Advice for starting a container / scripts that run more containers

kopax commented 8 years ago

I am trying to deploy some data containers using PanteraS image.

Since the data container ID need to be set for later mount, I was thinking of two differents possibilities with marathon :

Start data containers using a Dockerfile that run the docker run command and remove itself when finished
Start data containers using a scripts that run the docker run command
- What is the best options (1/2) ?
- Is there a third option I haven't thought about ?
- What permissions/extra conf does PanteraS need to be able to use the docker run properly ?

sielaq commented 8 years ago

Ad. 1. Dockerfile is not for starting containers, but for creating images, so I do not understand how it would work.

There are few options

Think about cassandra databased cluster this was meant to work on mesos level even
Alternatively: I think what you mean is to determine which number of instance are you running, and based on that decide what you need to do. There is a ticket open already https://github.com/mesosphere/marathon/issues/1242 many people wait for that feature and I think it will be done soon.

The last question "What permissions/extra conf does PanteraS need to be able to use the docker run properly ?" I did not get it. docker run is not related to PanteraS but to docker itself. This is being spawned by Marathon https://mesosphere.github.io/marathon/docs/rest-api.html#post-v2-apps. So you can run ANY container via marathon API and just add ENV variables that are related to consul/registrator/haproxy and it will work for other components in PanteraS.

kopax commented 8 years ago

I'll explain a bit further.

1)

script.sh

#!/bin/bash
docker create --name app_data company/app-data /bin/true
exit 0

Dockerfile

 FROM debian:jessie

 ADD script.sh script.sh

 CMD ./script.sh

2)

{
    "id": "/company/app", 
    "cmd": "chmod u+x install.sh && ./install.sh",
    "cpus": 0.1,
    "mem": 10.0,
    "instances": 1,
    "uris": [
        "https://www.company.com/s/6yg4zs0c48l76iv/install.sh"
    ]
}

And then you can use your data container with the app using volumes_from parameter :

{
"id": "/company/app",
"cmd": null,
"mem": 256,
"cpus": 0.2,
"instances": 1,
"env": {
  "SERVICE_NAME": "app",
  "SERVICE_TAGS": "haproxy"
},
"container": {
  "type": "DOCKER",
  "docker": {
    "image": "company/app:latest",
    "parameters": [
      {
        "key": "volumes-from",
        "value": "app_data"
      }
    ],
    "network": "BRIDGE",
    "portMappings": [
      {
        "containerPort": 8080
      }
    ]
  }
}
}

also

I was assuming that because marathon is able to run docker container, container in marathon would be able to run docker container also. (As in my example)

I am not sure if it's the best way to achieve that, that's why I am asking advice.

I think I should add these volumes when starting my data containers :

"volumes": [
  {
    "containerPath": "/var/run/docker.sock",
    "hostPath": "/var/run/docker.sock",
    "mode": "RW"
  },
  {
    "containerPath": "/usr/bin/docker",
    "hostPath": "/usr/bin/docker",
    "mode": "RW"
  },
  {
    "containerPath": "/lib/x86_64-linux-gnu/libapparmor.so.1",
    "hostPath": "/usr/lib/x86_64-linux-gnu/libapparmor.so.1",
    "mode": "RW"
  }
]

sielaq commented 8 years ago

If you wanna run docker container from container, then definitely you need docker binary and docker.sock inside. But why doing it if you can achieve exactly the same form marathon.

I would rather ask what is the real issue you try to solve?

kopax commented 8 years ago

I can't set neither retrieve the container name from marathon. To use volumes-from parameters, you need it.

The problem I am trying to solve is to restore my continous deployement pipeline I have (gitlab->jenkins->marathon.)

I am using data-container for each of my apps because it's easier to manipulate data and keep permission up to date without carrying of the unix file system permissions when you do backup or migrate from an host to another. I am also trying to implement a rsync into the data container so it will connect directly to dropbox and keep the data up to date in every instance of my app.

sielaq commented 8 years ago

You should start telling about it first :)

For that usecase we have inventet marathon_deploy https://github.com/eBayClassifiedsGroup/marathon_deploy The static part of deployment plan you put into yaml/json file - the dynamic like in your case: container name you can put into a ENV YOUR_VARIABLE=abc and use like %%YOUR_VARIABLE%% inside the yaml plan so Jenkins can propagate ENV variable and execute marathon_deploy

kopax commented 8 years ago

Then how do you set the container name using marathon_deploy ?

sielaq commented 8 years ago

every parameter in marathon_deploy can be ENV variable - even container name. In our case we have container name more statically, we have deployment plan per service, dynamically are some other options.

sielaq commented 8 years ago

you can transform your current JSON into YAML with dynamic variables in 3 steps:

Use json2yaml your.json > deploy.yml to transform it to YAML (it is included in marathon_deploy gem)
Replace some stuff that you require to be dynamic:

---
id: %%KOPAX_NAME%%
cmd: echo python canaries `hostname` > index.html; python3 -m http.server 8080
mem: 16
cpus: 0.1
instances: 2
container:
  type: DOCKER
  docker:
    image: ubuntu:14.04
    network: BRIDGE
    portMappings:
    - containerPort: 8080
      hostPort: 0
      protocol: tcp
env:
  SERVICE_TAGS: python,webapp,http,weight=1
  SERVICE_NAME: python
healthChecks:
- portIndex: 0
  protocol: TCP
  gracePeriodSeconds: 30
  intervalSeconds: 10
  timeoutSeconds: 30
  maxConsecutiveFailures: 3
- path: /
  portIndex: 0
  protocol: HTTP
  gracePeriodSeconds: 30
  intervalSeconds: 10
  timeoutSeconds: 30
  maxConsecutiveFailures: 3

.3. run marathon_deploy:

KOPAX_NAME=python-example marathon_deploy

or

export KOPAX_NAME=python-example
marathon_deploy

kopax commented 8 years ago

Tell me if I am wrong but this will set the marathon id. Not the docker container name started with mesos.

This name will always look similar to :

mesos-d5e24339-7af4-4816-8dbb-ace488c0ccfa-S3.b0953e3c-bac8-4cba-ab11-518954f5020f

This is the required name to attache a volume to a container, that's why I wan't to start it with the command line.

sielaq commented 8 years ago

Then I got something wrong then, container name cannot be changed in PanteraS/ mesos&marathon - this is fully dynamic.

But you can set dynamically like:

  volumes:
  - containerPath: %%PATH_DESTINATION%%
    hostPath: %%PATH_SOURCE%%
    mode: RO

I still don;t know what exactly you try to do. Might be docker linking will be more helpful.

kopax commented 8 years ago

When you store data using this config :

volumes:
  - containerPath: %%PATH_DESTINATION%%
    hostPath: %%PATH_SOURCE%%
    mode: RO

You specify to store them at this hostPath location.

Without hostPath, using Docker VOLUMES instruction, docker will store the data in a volumes in a temporary place, as long as there are reference of this volumes by a container.

Check docker ps -a, then docker inspect on any container, you will have at the Mounts point the hostPath storage location.

So basically, it is the same as your config, expect that when the latest container which has a reference to the data volume is deleted, all the data disappear forever :]

Anyway, I am using data container for the reasons explained in this article You should have a look, the advantage of storage data in a didacted containers are multiple :

You don't have to care about the storage location
You don't have to care about unix user/group
You can manipulate the data container by using the container name with volumes-from instruction with any images instead of having to know each volumes.

Considering this, I wanted to start the docker container with a small script that run the command to create the data container.

sielaq commented 8 years ago

I read that. As I said before, we need more generic solution that support fast writes too - no network involved at all (since having docker network all containers can be accessible).

Anyway, it is hard to me to choose between option 1 or 2, it really depends how much do you want to automatize: you have to consider whats gonna happen when your container that starts app_data, dies - if this, is started by marathon, it will be respawned - if you don;t want to - how to protect that. You will have to experiment. Both ways are possible and both are correct. Last, choose this one which match better your current continues integration process.

sielaq commented 8 years ago

btw. check this out: https://medium.com/@kelseyhightower/12-fractured-apps-1080c73d481c

kopax commented 8 years ago

Thanks for sharing your article ! I like the approach and it perfectly fit with Marathon health check / HA Proxy.

This is nice for example : a webapp & a database, the webapp won't listen on http while the db connection is off. I will definitely go that way for the DB.

But it doesn't answer the other problem I have and you don't have (since your don't have app with data) is to find when the data is fully synced or not before starting the application.

Example :

I have 1 instance of my app running on host A on marathon, I have atm 3 Go of data use by this app which are synced to dropbox/s3/whatever. I scale my app to 2 instance on marathon, the second instance will start on host B.

The application doesn't have a clue of the size of the data directory, how should I design the check before starting my app ?

I got a few ideas but none is beautiful :o

By the way, for what reason you decided to share the link ? What design pattern I am doing wrong ?

sielaq commented 8 years ago

Acc to your example: What I would do is:

Create a script that download/sync/whatever and finish with exit code 0; and add it your image
an entrypoint script like we have in frameworks - it helps you to start multiple agrs (multiple scripts): so you can do something like (YAML version):

args:
- script1.sh && script2.py && 
 java -Dhttp.port=8080
  -DlogDir=$MESOS_SANDBOX
  -Dinstance.confdir=file:///opt/etc
  -Xmx$JAVA_XMX
  -Xms$JAVA_XMS
  -jar $MESOS_SANDBOX/my_java_app*.jar || sleep 10000

It is definitely not beautiful and will never be since any asynchronous problems are never easy :)

ps. You are not doing wrong anything. I wanted to share with the good article that is very connected you do now, so might just help.

sielaq commented 8 years ago

Do you still need help here ?

kopax commented 8 years ago

I didn't had time to work further on this subject, I will in the next weeks but I am closing for now

eBayClassifiedsGroup / PanteraS

Advice for starting a container / scripts that run more containers #154