cloudfoundry / cf-mysql-release

Cloud Foundry MySQL Release
Apache License 2.0
58 stars 106 forks source link

multi-deployment support #182

Closed bkez322 closed 7 years ago

bkez322 commented 7 years ago

I would like to have deployed a mariadb cluster across multiple azs, each az with its own director. Unless theres a way to deploy to multiple directors in a single deployment, I will have to do multiple deployments. As far as I can tell then, I don't believe there is any way without modifying the release for the mysql-nodes in these separate deployments to replicate, since the config files appear to only load ips from the instances defined in that particular deployment. But I hope I am simply missing something. I am not using AWS.

cf-gitbot commented 7 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/150502349

The labels on this github issue will be updated when the story is started.

cppforlife commented 7 years ago

that could be done via bosh.io/docs/links-manual to explicitly fulfill links with particular ips that you pick. come by #bosh in cloudfoundry slack if it doesnt work out for some reason.

On Wed, Aug 23, 2017 at 4:37 PM, cf-gitbot notifications@github.com wrote:

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/150502349

The labels on this github issue will be updated when the story is started.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cloudfoundry/cf-mysql-release/issues/182#issuecomment-324491894, or mute the thread https://github.com/notifications/unsubscribe-auth/AALV921wCAA3DddabB3Qi0HL85z0w1riks5sbLfUgaJpZM4PAt1- .

bkez322 commented 7 years ago

I've done as @cppforlife suggested. When the deployment works the cross-deployment replication works flawlessly. But the deployments I've done this to have become very finicky and very frequently (but not always!) fail:

21:17:13 | Updating instance mysql: mysql/34602f28-a2f4-4e56-a44e-fff7652e0c8e (0) (canary) (00:02:22)
            L Error: 'mysql/0 (34602f28-a2f4-4e56-a44e-fff7652e0c8e)' is not running after update. Review logs for failed jobs: mariadb_ctrl, galera-healthcheck, gra-log-purger-executable, cluster_health_logger

21:19:35 | Error: 'mysql/0 (34602f28-a2f4-4e56-a44e-fff7652e0c8e)' is not running after update. Review logs for failed jobs: mariadb_ctrl, galera-healthcheck, gra-log-purger-executable, cluster_health_logger

Monit summary shows:

Process 'mariadb_ctrl'              Execution failed
Process 'galera-healthcheck'        running
Process 'gra-log-purger-executable' running
Process 'cluster_health_logger'     not monitored
System 'system_localhost'           running

In cluster_health_logger_ctl.err.log I find:

[2017-08-24 20:53:36+0000] ------------ STARTING cluster_health_logger_ctl at Thu Aug 24 20:53:36 UTC 2017 --------------
[2017-08-24 20:53:36+0000] 2017/08/24 20:53:36 dial tcp 127.0.0.1:3306: getsockopt: connection refused

And I'm having trouble finding anything else helpful in any other logs.

I'm trying to see if I can identify a pattern for when this happens, but in general it appears to only and consistently occur with non-new deployments.

I would appreciate any insight into why this may be happening or what I can do to fix it.

bkez322 commented 7 years ago

I think a lot of my issues were caused by split brain, because I started using an odd number of vms and lot of issues went away. The original issue I had has been addressed. I'm still having some significant issues when I make deployment changes that I think may be related to the use of manual linking, but I'll make a new issue if I can't figure them out.

menicosia commented 7 years ago

@bkez322 certainly love to hear more about why it's necessary to have a separate director per AZ. One of the major differences between the spiff scripts of cf-mysql-release and the sample manifests of cf-mysql-deployment is that the new manifests rely on BOSH's ability to distribute jobs across an AZ. The way you're approaching it, you lose that.

-- Marco Nicosia Product Manager Pivotal Software, Inc.

bkez322 commented 7 years ago

@menicosia My organization maintains 4 major AZs for production internal-services running on BOSH. Each of these AZs has its own director, and each of these directors manages only deployments on that AZ. There are multiple directors to avoid a single point of failure. So if your question then is why does each director manage only 1 AZ, rather than all AZs, then I can't say for certain why. I was not involved with my organization's decision-making with infrastructure.