autopilotpattern / mysql

Implementation of the autopilot pattern for MySQL
Mozilla Public License 2.0
172 stars 68 forks source link

Add support for defining service name in env var #77

Closed deitch closed 7 years ago

deitch commented 7 years ago

Per the discussion in https://github.com/joyent/containerpilot/issues/260 and @tgross request at https://github.com/joyent/containerpilot/issues/260#issuecomment-270137432

Still defaults to a service name of "mysql", but can be overridden by the env var SERVICE_NAME.

This allows one to use the exact same image in the same environment for different services, e.g. having three mysql instances for authdb, 2 more for ordersdb, etc.

tgross commented 7 years ago

We've added the service name to what we write to Consul via ContainerPilot here, but the management code also has to check the health. So I think we're going to need to inject this environment variable into that as well. Admittedly this is because of some close binding between the config and the code, so it'd be nice to clean this up anyways. It looks like this is predominantly a concern with the constant PRIMARY_KEY:

That variable is set via the environment's PRIMARY_KEY variable in utils.py. We can probably just swap that out for the SERVICE_NAME right there, but I'd like to see that verified in the tests.

deitch commented 7 years ago

I think you are saying that, in addition to the service registration in the catalog, which this PR lets you change via SERVICE_NAME, we also store which node of the (possibly multiple) nodes for the service is the primary using a simple key-value.

As long as there was just one service ("mysql") then we could have one key ("mysql-primary"). But if we can have multiple services, determined by the SERVICE_NAME env var, then we need to have multiple matching indications of primary, or else they all will try to use "mysql-primary" and trounce each other.

Is that it?

If so, why KV instead of adding a "primary" tag to the service when registering it in consul? If I recall correctly, the consul docs and examples for databases use tags that way explicitly. Mind you, that doesn't make it correct, but worth considering.

tgross commented 7 years ago

Is that it?

Yes!

If so, why KV instead of adding a "primary" tag to the service when registering it in consul?

Because we need an exclusive lock, and the tags don't allow for that as far as I understand.

deitch commented 7 years ago

Because we need an exclusive lock, and the tags don't allow for that as far as I understand. Yes, tags are not unique. No guarantees.

OK, so let's add this in as well.

deitch commented 7 years ago

Did a slight variant. Users may still want to override PRIMARY_KEY, so:

PRIMARY_KEY = env('PRIMARY_KEY', env('SERVICE_NAME','mysql')+'-primary')

Or if you prefer from the README:

- `PRIMARY_KEY`: The key used to record a lock on what node is primary. (Defaults to `${SERVICE_NAME}-primary`.)
deitch commented 7 years ago

Added. The unit tests do fail, but they appear to be failures unrelated to this change.

tgross commented 7 years ago

The unit tests do fail, but they appear to be failures unrelated to this change.

If you rebase on master I'm pretty sure you'll be good-to-go with the unit tests. It would be a good idea to add a unit test that covers this code path to make sure we're not missing anything.

deitch commented 7 years ago

If you rebase on master I'm pretty sure you'll be good-to-go with the unit tests.

OK, trying now.

good idea to add a unit test that covers this code path

Indeed. I will look at it.

deitch commented 7 years ago

If you rebase on master I'm pretty sure you'll be good-to-go with the unit tests.

One error gone, one still there:

======================================================================
FAIL: test_update (__main__.TestContainerPilotConfig)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 870, in test_update
    self.assertEqual(updated.read(), '')
AssertionError: '{"preStart": "python /usr/local/bin/manage.py", "tasks": [{"frequency": "5m", "command": "python /usr/local/bin/manage.py snapshot_task", "name": "snapshot_check", "timeout": "10m"}], "logging": {"level": "{{ if .LOG_LEVEL }}{{ .LOG_LEVEL }}{{ else }}INFO{{ end }}"}, "coprocesses": [{"restarts": "unlimited", "command": ["/usr/local/bin/consul", "agent", "-data-dir=/data", "-config-dir=/config", "-rejoin", "-retry-join", "my.consul.example.com", "-retry-max", "10", "-retry-interval", "10s"], "name": "consul-agent (host:{{ .CONSUL }})"}], "consul": "localhost:8500", "backends": [{"onChange": "python /usr/local/bin/manage.py on_change", "poll": 10, "name": "{{ if .SERVICE_NAME }}{{ .SERVICE_NAME }}{{ else }}mysql{{ end }}-primary"}], "services": [{"ttl": 25, "poll": 5, "health": "python /usr/local/bin/manage.py health", "name": "mysql", "port": 3306}]}' != ''

----------------------------------------------------------------------

I looked at the test.py file, I wasn't sure what it was doing:

    def test_update(self):
        self.environ['CONSUL_AGENT'] = '1'
        cp = ContainerPilot()
        cp.state = REPLICA
        cp.load(envs=self.environ)
        temp_file = tempfile.NamedTemporaryFile()
        cp.path = temp_file.name

        # no update expected
        cp.update()
        with open(temp_file.name, 'r') as updated:
            self.assertEqual(updated.read(), '')

What exactly is it doing here? It is expecting the config file to be equal to '' ?

tgross commented 7 years ago

What exactly is it doing here? It is expecting the config file to be equal to '' ?

Right, because it's not expecting the file to be written-to if there was no update (as alluded to in the # no update expected comment above. In the next step of the test we # force an update and expect the file to be now populated.

deitch commented 7 years ago

I think I am beginning to see it. It looks like bin/manager/containerpilot.py is a "poor man's processor". It does some template interpolation - just {{ .CONSUL_AGENT }} - but not the rest. Since it compares self.config['services'][0]['name'] != self.state to see if it should update, and since we set self.state to REPLICA or PRIMARY (which bin/manager/utils.py sets to mysql or mysql-primary), it is comparing them not to the correctly interpolated mysql and mysql-primary but instead to {{ if .SERVICE_NAME }}{{ .SERVICE_NAME }}{{ else }}mysql{{ end }}, which obviously does not match.

Is the only path then to add more variable interpolation into containerpilot.py? Is there a native library that does complete interpolation (short of rewriting in go)?

tgross commented 7 years ago

I think I am beginning to see it. It looks like bin/manager/containerpilot.py is a "poor man's processor". It does some template interpolation - just {{ .CONSUL_AGENT }} - but not the rest.

Right. This is intentional because we don't want to interfere with ContainerPilot's own rendering of the template, so we leave things like LOG_LEVEL alone.

Is the only path then to add more variable interpolation into containerpilot.py? Is there a native library that does complete interpolation (short of rewriting in go)?

There are plenty of template libraries but none of them are going to figure out what the values of the variables are, which as you can see is the biggest part of that code section. Adding the dependency on jinja or whatever doesn't get us anything.

deitch commented 7 years ago

In that case, will add .SERVICE_NAME interpolation to containerpilot.py.

sodre commented 7 years ago

I actually started looking into this code base over the weekend. Why not do a PR against ContainerPilot to have a “containerpilot-template” render binary similar to “consul-template” without the daemon?

That way, we know the templates are being rendered exactly like the Containerpilot code expects.

Best, Patrick

On Jan 23, 2017, at 9:21 AM, Avi Deitcher notifications@github.com wrote:

In that case, will add .SERVICE_NAME interpolation to containerpilot.py.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/autopilotpattern/mysql/pull/77#issuecomment-274499789, or mute the thread https://github.com/notifications/unsubscribe-auth/AA_rVcS2Q6VmE7vK4-JG8kH2vatWoh-Aks5rVLdlgaJpZM4LaYtr.

deitch commented 7 years ago

I actually started looking into this code base over the weekend. Why not do a PR against ContainerPilot to have a “containerpilot-template” render binary similar to “consul-template” without the daemon?

I like that idea, but I think I would keep the same binary with a --template mode, so you don't need to pull out the code, etc. In any case, every place you install a theoretical "containerpilot-template" binary you would install the "containerpilot" binary, so might as well.

deitch commented 7 years ago

This is really strange. I added the change to my own branch and tried to push out (so it would update this PR).

Got an "out of sync" message:

$ g push origin variable-service-name
To github.com:deitch/autopilotpattern-mysql.git
 ! [rejected]        variable-service-name -> variable-service-name (non-fast-forward)
error: failed to push some refs to 'git@github.com:deitch/autopilotpattern-mysql.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

That is even stranger, as the changes and the git log are identical... except for the sha hashes!

$ g remote -v
origin  git@github.com:deitch/autopilotpattern-mysql.git (fetch)
origin  git@github.com:deitch/autopilotpattern-mysql.git (push)
upstream        git@github.com:autopilotpattern/mysql.git (fetch)
upstream        git@github.com:autopilotpattern/mysql.git (push)
$ g log 
commit 4475b2871d9cb938f27d15d81893eaf375263e7f
Author: Avi Deitcher <avi@deitcher.net>
Date:   Tue Jan 24 12:06:12 2017 +0200

    Add templating support for SERVICE_NAME

commit 3e0d972759e3f729dbb512eff2e8be3f7f6ea343
Author: Avi Deitcher <avi@deitcher.net>
Date:   Thu Jan 19 14:52:19 2017 +0200

    Make backend-name variable as well

commit 18ce1a157b642612a52da532f42456a85bb080fc
Author: Avi Deitcher <avi@deitcher.net>
Date:   Thu Jan 5 19:45:07 2017 +0200

    Update PRIMARY_KEY to default to SERVICE_NAME-primary

(truncated)

And in my branch:

screen shot 2017-01-24 at 12 13 27 pm

Look at the commit hashes, they are different.

tgross commented 7 years ago

You might need to rebase onto the changes from upstream.

deitch commented 7 years ago

That's what I thought at first, but it didn't make sense. Rebasing changes which commits you branch off from. But here I have the same commits locally and on GitHub at origin, with different hashes. How is that even possible?

tgross commented 7 years ago

But here I have the same commits locally and on GitHub at origin, with different hashes. How is that even possible?

Does seem weird. Particularly given that it's your fork, right? Do you have more than one remote configured maybe?

deitch commented 7 years ago

I may just need to copy the change from the last commit, wipe out and re-clone my local repo.

deitch commented 7 years ago

And so I did. OK, unit tests now pass. @tgross ?

tgross commented 7 years ago

LGTM!

deitch commented 7 years ago

Hey @tgross : about @sodre's comment:

I actually started looking into this code base over the weekend. Why not do a PR against ContainerPilot to have a “containerpilot-template” render binary similar to “consul-template” without the daemon?

That way, we know the templates are being rendered exactly like the Containerpilot code expects.

I was thinking more giving the containerpilot binary an option to just convert templates. Since it already has to exist in the image, why not be able to do:

containerpilot --template --in /etc/containerpilot.json --out /some/out/path

(probably with a default --out to stdout and --in from /etc/containerpilot.json)

I don't mind putting in a PR for it... but not troubling myself unless there is desire for it.

sodre commented 7 years ago

Hi @deitch, I started the PR yesterday here joyent/containerpilot#271