01100010011001010110010101110000 commented 6 years ago

Missing Feature

Currently ManageIQ does not collect most managed services from cloud providers, some examples being databases such as Azure database for SQL, Amazon RDS instances, HDInsight, Elastic Map Reduce, etc.

As a result, if one wishes to provision these services from within ManageIQ, one must use a generic service template item and use some combination of tags and custom attributes on the resulting service to control the object in the provider, retire it, or perform other operations. This has a number of downsides, such as the services being static, unable to collect metrics, etc.

Proposed Solution

I would like to begin this by implementing models, refreshers, provisioning workflows for the database and map reduce services. In Azure, these are Azure Database for Engine and HDInsight, and in AWS these are RDS for Engine and EMR. I'll break out the specifics into sections for each service.

Cloud Database

There are currently CloudDatabase and CloudDatabaseFlavor models present in the core ManageIQ repo. These should be sufficient as they are to handle the different tooling available in AWS, Azure, Google Cloud, and OpenStack. Azure has a fair amount of tooling built up around their PaaS offering for MS SQL; it is split out into three different 'services' which all tie together.

SQL Servers
SQL Databases
SQL Elastic Pools

The SQL server defines admin credentials and an endpoint for management, equivalent to an AWS DBInstance. The databases are databases within a server. These databases have flavors, called a ServiceObjectiveName in Azure parlance. These flavors can be applied to individual databases, or to a collection of databases via an elastic pool, so a database flavor would be either a specific ServiceObjectiveName, or the database's elastic pool.

This is much more straightforward in AWS. Here, the flavor of a database is simply the sizing of the DBInstance, and the individual databases are not exposed. This is also how Azure handles all other DB engines it offers, such as MySQL and PostgreSQL.

Google Cloud is essentially a simplified AWS in its database offerings. It offers MySQL and has PostgreSQL in beta, with specific database instance sizes and database engine versions. Like Azure, it is possible to enumerate the databases in an instance via the API.

Trevor, OpenStack's DBaaS plugin, has a REST API that allows one to list database instances, the databases in those instances, engine versions, etc. The information available there fits nicely into the models already present.

To gather the most accurate information, I believe it is appropriate to map each DBInstance in AWS into a CloudDatabase, and do the same in Azure for all PaaS database offerings expcept MS SQL. For MS SQL, we should map the individual databases into a CloudDatabase, because the performance level of a database is tied to the database itself, not to the server.

EMR

Azure and AWS both have PaaS offerings for clusters running various big data processing software and their supporting applications: Hadoop, Spark, Hive, Zookeeper, etc. They differ in that in Azure one must deploy a cluster per processing engine, e.g. one for Hadoop MapReduce and one for Spark, but in AWS it seems to be possible to deploy all supported software onto a single cluster. However, they have in common a release version for whatever versions of the software have been packaged by AWS and Azure, and an applications array containing the applications installed on the cluster.

Again it seems that Google Cloud is a simplified AWS. One chooses an image version, which contains all available applications (Hadoop, Spark, etc). It does not seem to be possible to query what software is actually installed, only the image version number, so applications would likely be nil here.

Sahara, OpenStack's big data plugin, has a REST API that exposes clusters and the templates used to create them. The templates are prebuilt node configurations, but the software installed on those nodes comes from so-called plugins, which exist for vanilla Apache Hadoop, Spark, and other processing engines. So the version will come from the plugin associated with a cluster.

With that in mind, I think something like this would work well for the schema of a CloudMapReduce model:

Column	Type
id	bigint
ems_id	bigint
cloud_tenant_id	bigint
resource_group_id	bigint
ems_ref	string
name	string
version	string
extra_attributes	text
status	string
status_reason	string
applications	array
nodes	array

Where nodes are the Flavors that the nodes are running on, and applications are the applications installed into the cluster. For Azure, this will be the kind of cluster and whatever additional apps are installed, and for AWS will simply be the apps. I am uncertain if we will need a separate model for this, or if it can simply be an array of hashes along the lines of:

[{
    name: "app_name",
    version: 1.0.0
}]

PRs

[ ] Models for CloudMapReduce
[ ] Refresher for databases
[ ] Refresher for MR clusters
[ ] Expose models to automate
[ ] Trigger provider refresh upon relevant events for these models
[ ] Provisioning workflow for databases
[ ] Provisioning workflow for MR clusters

Fryguy commented 6 years ago

@bzwei can you add some comments here, since you had started one of these?

bzwei commented 6 years ago

@01100010011001010110010101110000 can you add analysis for other cloud providers that manageiq currently supports, for example OpenStack and Google?

01100010011001010110010101110000 commented 6 years ago

@bzwei Can do, I'll update the original post with whatever I find soon.

01100010011001010110010101110000 commented 6 years ago

Updated. I also looked into VMware's vCloud offerings for these and they don't seem to have anything of note yet. I couldn't find anything for Air SQL beyond a tech preview announcement in 2015, and their Big Data Extensions is a virtual appliance running on vSphere, that I think would require a separate provider.

Also, I'm concerned that I may be shoehorning the databases into CloudDatabase because of the odd way Azure treats MS SQL databases. Might it be better to add a CloudDatabaseInstance model that has_many CloudDatabases? That would leave the issue of where to associate flavors, however, as flavors are associated directly to the database in Azure MS SQL, but to the database instance everywhere else, even in Azure's other DBaaS offerings. Thoughts?

bzwei commented 6 years ago

I don't quite understand the problem with MS SQL. One MS SQL Server can have many SQL Databases; while each database can have its own flavor. We can map SQL Database to MIQ's CloudDatabase, can't we?

01100010011001010110010101110000 commented 6 years ago

You're right, I was overthinking that; we should be good to go with the current database models.

ManageIQ / manageiq-design

[RFC] Collect, manage, and provision Cloud provider managed (PaaS) services #34

Missing Feature

Proposed Solution

Cloud Database

EMR

PRs