cloudera-labs / cloudera.cluster

An Ansible collection for lifecycle and management of Cloudera CDP Private Cloud resources on bare metal, IaaS, and PaaS.
Apache License 2.0
32 stars 46 forks source link

Can't deploy two clusters that contain the same set of services. #174

Open wfh-andrew opened 6 months ago

wfh-andrew commented 6 months ago

We have tried deploying two clusters to a single Cloudera Manager using the ansible playbooks. One cluster will be a Test environment and the other cluster will be a Production environment. Both clusters contain the same set of services [HBASE, HDFS, KAFKA, SOLR, ZOOKEEPER]. The first cluster deploys successfully, however the deployment fails when creating new services for the second cluster.

Deployment fails because the services table in the scm database has a unique constraint on the service name.

"services_name_key" UNIQUE CONSTRAINT, btree (name)

The issue is the cluster_template for services sets the refName for the service to the lowercase value of the service name. This does not result in a unique name for the service between clusters.

For instance KAFKA becomes kafka. This results in the following error:

failed: [server1.example.com] (item=KAFKA) => {"ansible_loop_var": "service", "cache_control": "no-cache, no-store, max-age=0, must-revalidate", "changed": false, "connection": "close", "content": "{\n \"message\" : \"Batch entry 0 insert into SERVICES (OPTOMISTIC_LOCK_VERSION, EXCLUSIVE_LOCK_VERSION, NAME, DISPLAY_NAME, SERVICE_TYPE, SERVICE_VERSION, MAINTENANCE_COUNT, GENERATION, CLUSTER_ID, SERVICE_ID) values (0, NULL, 'kafka', 'kafka', 'KAFKA', NULL, 0, 1, 1546351673, 1546352751) was aborted: ERROR: duplicate key value violates unique contraint \\\"services_name_key\\\"\\n Detail: Key (name)=(kafka) already exists. Call getNextException to see other errors in the batch.\",\n \"causes\" : [ \"ERROR: duplicate key value violates unique contraint \\\"services_name_key\\\"\\n Detail: Key (name)=(kafka) already exists.\" ]\n}", "content_type": "application/json;charset=utf-8", "date": "Tue, 21 Nov 2023 21:35:01 GMT", "elapsed": 0, "expires": "0", "json": {"causes": ["ERROR: duplicate key value violates unique constraint \"services_name_key\"\n Detail: Key (name)=(kafka) already exists."], "message": "Batch entry 0 insert into SERVICES (OPTOMISTIC_LOCK_VERSION, EXCLUSIVE_LOCK_VERSION, NAME, DISPLAY_NAME, SERVICE_TYPE, SERVICE_VERSION, MAINTENANCE_COUNT, GENERATION, CLUSTER_ID, SERVICE_ID) values (0, NULL, 'kafka', 'kafka', 'KAFKA', NULL, 0, 1, 1546351673, 1546352751) was aborted: ERROR: duplicate key value violates unique contraint \"services_name_key\"\n Detail: Key (name)=(kafka) already exists. Call getNextException to see other errors in the batch."}, "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request", "pragma": "no-cache", "redirected": false, "service": "KAFKA", "set_cookie": "SESSION=NjA4MjN1MmQtNzFkMy00ZDFiLWIxMDItZjc0Zjk0YmMzNmM5; Path=/; Secure; HttpOnly; SameSite=Lax", "status": 400, "strict-transport-security": "max-age=31536000 ; includeSubDomains", "url": "https://server1.example.com:7183/api/v52/clusters/Production/services", "x_content_type_options": "nosniff", "x_frame_options": "DENY", "x_xss_protection": "1; mode=block"}

The above error is output for each duplicate service being deployed.

It would be useful if the service name (refName) could be overridden so that it can be made unique between clusters.

I'm not sure of the complexities of this, but maybe an option could be to allow adding a unique string to the service name in the deployment.yml?

Example:

clusters:
  - name: Test
    services: [HBASE-TEST, HDFS-TEST, KAFKA-TEST, SOLR-TEST, ZOOKEEPER-TEST]
    # configs for Test cluster
    ...
    ...
  - name: Production
    services: [HBASE-PROD, HDFS-PROD, KAFKA-PROD, SOLR-PROD, ZOOKEEPER-PROD]
    # configs for Production cluster
    ...
    ...

That way the name would be unique in the services table of the scm database:

 service_id |     name       | service_type  | cluster_id |        display_name         
------------+----------------+---------------+------------+-----------------------------
 1          | mgmt           | MGMT          |            | Cloudera Management Service 
 2          | kafka-test     | KAFKA         | 1          | KAFKA                       
 3          | solr-test      | SOLR          | 1          | SOLR                        
 4          | hbase-test     | HBASE         | 1          | HBASE                       
 5          | hdfs-test      | HDFS          | 1          | HDFS                        
 6          | zookeeper-test | ZOOKEEPER     | 1          | ZOOKEEPER                   
 7          | kafka-prod     | KAFKA         | 2          | KAFKA                       
 8          | solr-prod      | SOLR          | 2          | SOLR                        
 9          | hbase-prod     | HBASE         | 2          | HBASE                       
 10         | hdfs-prod      | HDFS          | 2          | HDFS                        
 11         | zookeeper-prod | ZOOKEEPER     | 2          | ZOOKEEPER                    

I'm interested in hearing your thoughts on this. Thanks, Andrew