big-data-europe / docker-event-detection

Apache License 2.0
4 stars 2 forks source link

Missing docker-compose snippet #1

Closed earthquakesan closed 7 years ago

earthquakesan commented 7 years ago

Please provide docker-compose snippet for this component. For an example, see Spark README.md

npit commented 7 years ago

Sorry for the delay, I missed this. Do you have any advice on how I should parameterize the container with sensitive data (authentication & connection credentials) when launching via the compose? Ideally, I wouldn't want to hardcode them in plain sight in a default configuration.

earthquakesan commented 7 years ago

hi @npit!

It is obviously can not be in a public git repo. The first step here would be to create docker-compose.yml with all the parameters but without values. For example, take a look at this docker-compose.yml (RECAPTCHA_PUBLIC_KEY, RECAPTCHA_PRIVATE_KEY).
Then you will have two options:

  1. Clone repo to your production server. Make a copy of the file with credentials. That is not really sustainable, see 2.
  2. Make a private repo extend docker-compose.yml with all the sensitive data. On how to do extensions take a look here.
earthquakesan commented 7 years ago

For a note, big-data-europe has no private repos (not included in our current billing plan). Don't know how to manage that in the best way at the moment.

@madnificent @HajiraJabeen ?

npit commented 7 years ago

@earthquakesan

Make a private repo extend docker-compose.yml with all the sensitive data. On how to do extensions take a look here.

That's what I was leaning towards, but doesn't that suffer from the same problem? Getting stuff from the private repo will require authentication, so some kind of sensitive data will have to be hardcoded to get the configuration from the private repository. And if the private repo does not require authentication, then the sensitive data there will be up for grabs, defeating the whole purpose.

earthquakesan commented 7 years ago

@npit If you are talking in sense of using private repo as a part of CI/CD pipeline, then you should use public/private key pair (RSA-2048 for example). CI/CD pipeline should be secured as well as the production server.

npit commented 7 years ago

@earthquakesan How will the key be supplied to the container though? It can't be uploaded, so it needs to be provided manually by the one who runs the compose. And if that is acceptable, then we could do that with the sensitive parameters themselves to begin with.

earthquakesan commented 7 years ago

You don't need to checkout docker-compose.yml from container, do you? Why would you do that?

As for key management, you can add as many as you like to the repo. Then it can be deployed either by devs (if that how it works), sysadmins (in more perfect world) or automated deployment solution (in the perfect world). However, I am not aware if one of those exist at the moment.

npit commented 7 years ago

You don't need to checkout docker-compose.yml from container, do you? Why would you do that?

I am not sure I understand - what do you mean checkout the .yml?

As for key management, you can add as many as you like to the repo. Then ...

Correct me if I'm wrong, but I thought that the public key would be bound to the private repo, and the bde container should be equipped with the private key to be able to authenticate and pull automatically from that private repo. Getting that private key in the container automatically is the issue, in order to do docker-compose up and automatically download the sensitive data and initialize the container, without the need for whomever running the above command to manually pass the key/parameters/sensitive data.

Isn't that right?

earthquakesan commented 7 years ago

No, we have misunderstanding here. Let's go through the whole thing once again.

  1. All the docker images are public and available on big-data-europe github. For each image we have automated build on docker hub and they contain no sensitive information.
  2. We have special git repositories for pilot deployment, where we store docker-compose.yml files, which are deployed on the servers as follows:
    docker-compose -H :4000 -f docker-compose.yml up -d

    The problem here is where you clone the repo from step 2, you need to read the docs for the deployment and inject API keys etc in the docker-compose.yml. That does not affect containers from step 1 in any sense.

The problems here is if there are several people on deployment team, then the deployment might become unmanageable.

To mitigate this problem, I proposed to create private git repo with docker-compose.yml with all the sensitive data. Again, those docker-compose.yml have to be deployed the same way:

git clone sc4-deployment-secret
docker-compose -H :4000 -f sc4-deployment-secret/docker-compose.yml up -d

This step can be automated by deployment team with scripts by automatically cloning private git repos and deploying them on the swarm (i.e. CD).

However, during the last call we agreed that we do not have such a problem in BDE project. Therefore, we will not create private repos and all the sensitive data is managed by responsible for pilots (i.e. cloning public git repos from big-data-europe org and then editing it).

We are already in quite an offtopic. Can you please add docker-compose.yml omiting sensitive data (see example in my comment up there) and close the issue.

earthquakesan commented 7 years ago

All the parameters from the .conf files should be configurable via environmental variables.