cloudfoundry-attic / cf-deployment-transition

This repository is deprecated - no longer accepting PR's or Issues
Apache License 2.0
9 stars 5 forks source link

Feature Request: Tooling Support for Migration to CF-Deployment on vSphere #1

Closed phong2tran closed 6 years ago

phong2tran commented 7 years ago

This tooling project currently supports the migration to cf-deployment only on AWS, but we have cf-release/diego-release deployed on vCenter/vSphere in our own cloud infrastructure to run the production systems. We would like to have the tooling support for the migration to cf-deployment in vSphere.

  1. Will there be any support in the tooling for migration to cf-deployment from cf-release in vSphere environment when tooling is ready for production usage?

  2. If there is no planned support for migration to cf-deployment for vSphere environment in this tooling project, what options are available for a migration path from cf-release to cf-deployment if Cloud Foundry is deployed on vSphere? Will there be a step-by-step doc with details for manually doing this migration?

Thanks, Phong

cf-gitbot commented 7 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/147965767

The labels on this github issue will be updated when the story is started.

dsabeti commented 7 years ago

Hi @phong2tran. Thanks for opening this issue. That description in the README is a bit unclear (I'll update it with a better explanation). Allow me to untangle:

I actually suspect that nothing about the transition tooling would have to change to enable the migration from cf-release to cf-deployment on vSphere. However, given that we don't test cf-deployment at all on vSphere, we do want to get some validation of the necessary tooling required to get cf-deployment to work on vSphere before recommending the migration.

In light of that, I've got a few questions about your deployment, so we can get an understanding for what vSphere deployments look like and how we can accommodate it in cf-deployment (and potentially the migration tooling).

  1. How do you manage your vSphere configuration? Do you mostly work on it through the vCenter GUI, or do you have any automation like terraform?
  2. What kind of load balancer do you use to route traffic to the gorouters in your deployment?
  3. What kind of datastores -- databases for CCDB, UAADB etc., as well as your CC blobstore -- do you use? If you manage them yourself, how do you deploy them?
phong2tran commented 7 years ago

Hi David, First thanks for a quick following up! Please see my comments inline.

Thanks, Phong

From: David Sabeti [mailto:notifications@github.com] Sent: Tuesday, June 27, 2017 5:30 PM To: cloudfoundry/cf-deployment-transition cf-deployment-transition@noreply.github.com Cc: Phong Tran ptran@opentext.com; Mention mention@noreply.github.com Subject: [EXTERNAL] - Re: [cloudfoundry/cf-deployment-transition] Feature Request: Tooling Support for Migration to CF-Deployment on vSphere (#1)

Hi @phong2tranhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_phong2tran&d=DwMFaQ&c=ZgVRmm3mf2P1-XDAyDsu4A&r=ZI-s532NcdRqooX2gw8a0s0-2eku6V2AL93gpys8K4Y&m=mhQD3mPr-o-NgRYJGEIeUYjKLKVUVDpiHntkTNB0L0A&s=uhbDrn2CeEHAjGVATQ0By4Jknccs5QqS-0G_5yuMpOs&e=. Thanks for opening this issue. That description in the README is a bit unclear (I'll update it with a better explanation). Allow me to untangle:

I actually suspect that nothing about the transition tooling would have to change to enable the migration from cf-release to cf-deployment on vSphere. However, given that we don't test cf-deployment at all on vSphere, we do want to get some validation of the necessary tooling required to get cf-deployment to work on vSphere before recommending the migration.

In light of that, I've got a few questions about your deployment, so we can get an understanding for what vSphere deployments look like and how we can accommodate it in cf-deployment (and potentially the migration tooling).

  1. How do you manage your vSphere configuration? Do you mostly work on it through the vCenter GUI, or do you have any automation like terraform? [Phong] We mostly work with vSphere through vSphere Web Client GU. We do not use terraform tool for automation

  2. What kind of load balancer do you use to route traffic to the gorouters in your deployment? [Phong] We use HAProxy which is part of cf-release as the load-balancer for our internal test environments. For our production environments, we’re using F5 as load-balancer before gorouters.

  3. What kind of datastores -- databases for CCDB, UAADB etc., as well as your CC blobstore -- do you use? If you manage them yourself, how do you deploy them? [Phong] We’re using Postgres which is part of cf-release for CCDB, UAADB and etc. For CC blobstore, we’re using NFS store (NFS job which is also part of cf-release). These datastores (Postgres, NFS) are deployed along with other CF components from cf-release.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cloudfoundry_cf-2Ddeployment-2Dtransition_issues_1-23issuecomment-2D311523130&d=DwMFaQ&c=ZgVRmm3mf2P1-XDAyDsu4A&r=ZI-s532NcdRqooX2gw8a0s0-2eku6V2AL93gpys8K4Y&m=mhQD3mPr-o-NgRYJGEIeUYjKLKVUVDpiHntkTNB0L0A&s=W1hZrXblZ3kWCq9RnO6LwIysisinjAxDyi9FPb1JauI&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AcXN1EcZzV-5FtHCgBDxf3T5PXlLHLh-5FHSks5sIZ6igaJpZM4OHQQy&d=DwMFaQ&c=ZgVRmm3mf2P1-XDAyDsu4A&r=ZI-s532NcdRqooX2gw8a0s0-2eku6V2AL93gpys8K4Y&m=mhQD3mPr-o-NgRYJGEIeUYjKLKVUVDpiHntkTNB0L0A&s=cq0CkNSIohW_jK4kRkA-2wED2ZkBwRfYA5ZZpUovWAU&e=.

dsabeti commented 7 years ago

Hey @phong2tran, thanks for the response. It looks like the email client included a bunch of extra stuff, so I'm going to summarize your response for clarity.

vSphere configuration:

We mostly work with vSphere through vSphere Web Client GU. We do not use terraform tool for automation

Load balancing:

We use HAProxy which is part of cf-release as the load-balancer for our internal test environments. For our production environments, we’re using F5 as load-balancer before gorouters.

Datastores:

We’re using Postgres which is part of cf-release for CCDB, UAADB and etc. For CC blobstore, we’re using NFS store (NFS job which is also part of cf-release). These datastores (Postgres, NFS) are deployed along with other CF components from cf-release.

dsabeti commented 7 years ago

And now for my actual response.

Load balancing: How do you configure the F5 to route traffic to the gorouters? Is there out-of-band configuration, or do you use something from BOSH?

For example, with AWS ELB's, you can specify a way to configure the ELBs at bosh deploy-time, using cloud_properties in the resource_pools for v1 manifests, and vm_extensions in v2 manifests.

Datastores: Are you using Postgres and NFS for a prod deployment? Postgres is non-HA and doesn't perform backups of any kind, so we typically recommend it only for development/testing use. Also, as far as I know, the NFS server is considered a deprecated component (there hasn't been an update to it since November 2014) and has be replaced by WebDAV -- which is itself also typically recommended only for dev/testing.

If you are using these for production, I'm wondering if there's a way to migrate to more stable services, ideally without incurring downtime (seems unlikely).

phong2tran commented 7 years ago

Load balancing: F5 is not deployed and managed through BOSH. It's a separate out-of-band installation & configuration.

Datastores: Yes we're also using Postgres and WebDAV blobstore (not NFS as originally indicated) in production deployments. Currently we don't have any plan for moving away from Postgres database and WebDAV bobstore.

With the usage of these internal datastores (Postgres/WebDAV) and the load balancers (external F5 and internal HAProxy) in our CF deployments, what implications could entail when we migrate to cf-deployment from cf-release?

dsabeti commented 7 years ago

Our current tooling has been assuming that deployers aren't using the internal Postgres or WebDAV, because they're not recommended for production use. Likewise with HAProxy -- although it sounds like your F5 balancer gives you a production-ready LB.

As it stands, our tooling is incomplete for your use case. You could use our tooling for some of the migration work, but you also have to do some work yourself to build ops-files to help you migrate. We can try to help you and give you some guidelines for how to make them.

I'll keep an eye out for other deployers who are using HAProxy, Postgres, and/or WebDAV. I don't expect that we'll accommodate those into our transition tooling -- I'd say that doing those migrations are really a prerequisite for upgrading to cf-deployment -- but if many other operators are in a similar situation, we'll see what we can do to help.

gberche-orange commented 7 years ago

Orange is also using postgress and WebDAV for its production CF instances, although the cf-release's haproxy is not consistently used.

/CC @poblin-orange @fguichard

phong2tran commented 7 years ago

@David, For CF deployments hosting in a private cloud (on-premises infrastructure), it won't be easy to use data services such AWS RDS and S3 from the public cloud.

If Postgres and WebDAV are not recommended for production use, what options would you recommend for CC database and CC blobstore with CF deployments in a private (on-premises) data center?

dsabeti commented 7 years ago

Ok, I've got an important question for @phong2tran and @gberche-orange: do you experience downtime when you upgrade your deployment -- either by updating cf-release or the stemcell? I ask because the Postgres and WebDAV jobs are both singletons and (consequently) don't perform any kind of replication. It seems logical then that there would be some amount of downtime when you update those jobs. If you do experience some downtime, how do you manage it?

As far as recommended databases and blobstores for on-premise deployments, we've typically believed that deployers should bring their own HA datstores -- for example, deploying your own clustered database. I'm not sure how effectively that's been communicated. Let me see if I can get some real data about what other deployers do.

gberche-orange commented 7 years ago

@dsabeti Yes, we do experience a short control plane downtime (i.e. CC API not available) during upgrades. Would cf-mysql-release be suiteable/recommended for HA datastore ?

We also plan externalize blobstore in S3 compatible in house object storage as to reduce such downtime in the future.

phong2tran commented 7 years ago

We also have planned downtime when upgrading CC and its supported components. This short window of downtime of CC is not a critical issue and still manageable as it affects only Ops team, but not our end users (SaaS customers).

Can Minio (S3 compatible: https://www.minio.io/) or OpenStack Swift be used as the blobstore for CC?

jriguera commented 7 years ago

We also Vsphere as IaaS with one cluster per AZ. PostgreSQL, Netapp to provide HA NFS volumes for the blobstore and F5 LB on top of go-routers. Go-routers have static IPs and the F5 configuration is made with ansible (I would like to switch to the ansible-bigip-boshrelease colocated job on the go-routers).

PostgreSQL is taken for cf-release but it is a separate deployment, so we can predict the downtime when it is deploying (we have to make some decisions about that). Currently the downtime we have is only when the API jobs are being deployed, so clients cannot talk with the API, we deal with than by stopping the pipelines so nobody can push apps for a few minutes.

Let me now if you need more details.

dsabeti commented 7 years ago

@jriguera Thanks for that information. It's incredibly helpful to hear how operators are managing their databases. One common theme I'm seeing is that some small amount of control-plane downtime is acceptable, especially if you can control when that downtime occurs. @phong2tran, @gberche-orange: would you be able or interested in migrating to a deployment architecture where your datastores are in separate deployments?

Also, I realize I've failed to answer some of your questions from earlier in the thread.

@gberche-orange: Based on the theme I just pointed out, it may not be necessary to shoot for an HA database. Still, if you want to migrate to an HA database, cf-mysql-release is the recommended way to deploy your own HA database.

@phong2tran: We've never tried Minio as a blobstore, but anything that's compatible with the fog gem (which is what cloud controller uses) can be used. @rkoster has an open PR to add an ops-file for deploying Minio for exactly this purpose, so he may have experience using it successfully.

gberche-orange commented 7 years ago

thanks @dsabeti for your reply. We are considering migrating from pg to cf-mysql-release. Is there an available migration path for CF operators from postgresql to mysql while preserving existing data ?

Would removing postgres opt-in from cf-deployment and import postgresql data in mysql using tool such as https://github.com/pivotal-cf/pg2mysql contributed by @krishicks (or mysql workbench pg import) likely to work, or best was previously successful in the community ?

Looking to CC db, and CAPI DB migration guide for CAPI developers, the db-specifics seem quite limited. Sequel migrations does not seem to specifically address the use case of migrating across database types.

Yet to look at other components: Diego, Uaa, Container networking...

Having such migration plan being documented and automatically tested would be quite useful to help the community move towards the recommended cf-mysql-release HA, and also possibly from mysql to pg.

would you be able or interested in migrating to a deployment architecture where your datastores are in separate (bosh) deployments? Yes, to us this would bring the following

pros:

cons:

dsabeti commented 7 years ago

@gberche-orange We've never tried a migration from database type to another, so I'm not sure how that would work. From a deployment perspective, you'd at least have to do the following (in order):

  1. Deploy the new database
  2. Migrate data from old database to new database
  3. Redeploy CF so that it connects to the new database instead of the old one
  4. Delete the old database

I have no idea how the actual components (Cloud Controller, UAA, BBS, etc.) will function through this. For example, you might need a downtime window where the Cloud Controller is no longer accepting requests while you migrate data from one database to another. You may also have to perform local database schema migrations (which is wha the Sequel migrations do). I'm not sure.

For more help with that, you could start by asking in https://github.com/pivotal-cf/pg2mysql for help. For component-specific information such as schema migrations, you'll have the most luck asking in their individual repos.

Edit: And thanks for your feedback about the separate deployments. This is a feature we're thinking about adding to cf-deployment.

krishicks commented 7 years ago

I have only skimmed this thread.

But, in response to @dsabeti above, that exact problem has been solved via Customer[0] for PCF customers with the pg2mysql tool that I wrote.

We have documentation on how to run it (which means going through steps to do just what @dsabeti mentioned), though it's not something that's meant to be run by just anyone (usually a Customer[0] person is supervising, and a dedicated support engineer for the customer as well).

Our instructions are also targeted at PCF, not open source CF, though there's nothing stopping someone from doing the same migration on open source CF with pg2mysql. As long as you've got the databases migrated and can supply the credentials and addresses to the mysql/postgres instances, pg2mysql will do what you need. The only real difference is in PCF you have to grab credentials that were generated from the OpsMgr UI, and in open source CF you get them either from your manifest or from CredHub/the BOSH creds file.

jriguera commented 7 years ago

I forgot to say something about SQL databases. Currently we also have PostgreSQL, we can mitigate downtime because of using a different deployment (it is from the same cf-release as cf-deployment) so in order to try to get rid of the downtime (and risks of having single instance postgresql) we are exploring 2 directions:

  1. Keep using Posgresql, but with a HA solution. I have been doing some experiments with stolon and it looks really nice, so I have decided to build a boshrelease for it. Almost done but is still not finished.

  2. Switch to MySQL. It seems it is possible (but we need to test it properly) have a look at this link: http://www.starkandwayne.com/blog/migrating-cloud-foundry-from-pg-to-mysql-a-report/

krishicks commented 7 years ago

I would strongly advise against following Stark and Wayne's migration methodology.

It's far better to use the approach we've done with pg2mysql, which is to use the existing code to migrate the empty databases in MySQL; for CC this means running a rake db:migrate with a custom config pointing at MySQL and for UAA it means spinning up a duplicate UAA instance in the environment, also pointing at MySQL. Both databases end up with their final schemas, and then pg2mysql is used just to copy the data from PG to MySQL with the native Golang drivers.

jriguera commented 7 years ago

I completely agree with you @krishicks about using pg2mysql, I've just point to the link because it seems it is possible (ofc not trivial).

wbean1 commented 7 years ago

Chiming in to say we also use the Postgres & WebDAV in cf-release in our cloudfoundry, as some control-plane outage during deployments is acceptable to us. We do not use the HAProxy jobs as we have other means for loadbalancing the routers. We deploy to openstack IaaS.

+1 for some migration thought/process-verbiage for those of us using cf-release postgres db & webdav blobstore jobs.

dsabeti commented 7 years ago

Thanks for chiming in @wbean1. I've written a small epic to cover what are think are the main needs for operators on vSphere and Openstack: https://www.pivotaltracker.com/story/show/150906529

Let me know if that epic is missing anything you'd need.

dsabeti commented 6 years ago

Hi all. It's been some time since we passed the milestone I mentioned earlier. The migration for cf-release to cf-deployment on vSphere environments is functional and supported. I'm going to close this issue, but feel free to re-open if you're finding any other issues with this process.