confluentinc / cp-ansible

Ansible playbooks for the Confluent Platform
Apache License 2.0
41 stars 405 forks source link

Deprecate and eventually remove the upgrade playbooks #588

Closed erikgb closed 3 years ago

erikgb commented 3 years ago

UPDATE: I have changed my suggestion on this issue, see https://github.com/confluentinc/cp-ansible/issues/588#issuecomment-814289884, but keeping the original description for reference.

Upgrade Kafka Broker should be idempotent and not restart brokers when system already has desired state.

Describe the issue We are trying to extend our Kafka GitOps approach with support for upgrades, with reference to the Upgrade Confluent Platform with Ansible Playbooks documentation. Trying to upgrade from CP 6.0 to CP 6.1. So far we completed step 1. (Upgrade Zookeeper) and step 2. (Upgrade Kafka), and the actual upgrade (first run) went well.

But it seems like the playbook upgrade_kafka_broker.yml is not idempotent - at least not the idempotence I am expecting. Even if the end result is the same after subsequent runs, all the brokers are unfortunately restarted on every single run. And this is far from optimal. Do you think it can be fixed?

To Reproduce Run the upgrade of Zookeepers and Kafka Brokers at least twice:

ansible-playbook -i /path/to/hosts.yml upgrade_zookeeper.yml
ansible-playbook -i /path/to/hosts.yml upgrade_kafka_broker.yml -e kafka_broker_upgrade_start_version=6.0

Expected behaviour No tasks marked as changed on 2. (and subsequent) runs - including no broker restarts.

Inventory File Can provide additional information if needed.

Logs Can provide additional information if needed.

Environment (please complete the following information):

Additional context Can provide additional information if needed.

erikgb commented 3 years ago

@domenicbove I am not sure what do to with this issue? Close it? I am more interested in desired state upgrade/downgrade (https://github.com/confluentinc/cp-ansible/issues/591), and do not care that much about the upgrade playbooks. 😊 But I think it can be possible to redesign the upgrade playbooks for customers that are ignorant to the broker properties inter.broker.protocol.version and log.message.format.version - based on the result of https://github.com/confluentinc/cp-ansible/pull/611. But I prefer to be in charge of those important broker properties.....

erikgb commented 3 years ago

After https://github.com/confluentinc/cp-ansible/pull/611, I have now successfully upgraded from CP 5.5 to CP 6.1 just using the desired state upgrade/downgrade added to the 6.1.x branch. Working like a charm, I am suggesting to deprecate and eventually remove the upgrade playbooks, and advice users of cp-ansible to just use their inventory and the desired state playbooks.

If this is supported by the community, the upgrade documentation will need to be rewritten, so here sharing my notes on how we upgraded. With reference to Apache Kafka upgrading docs:

  1. Prepare your inventory for the upgrade
    • Ensure that you have set the Kafka Broker properties inter.broker.protocol.version and log.message.format.version properties according to CURRENT_KAFKA_VERSION. Note: This is the upstream Apache Kafka version, and not CP version. This is very important for (at least) two reasons:
      1. It will allow the rolling upgrade/restart to work smoothly without additional disruption or downtime in your cluster.
      2. Allows for rollback/downgrade in case it turns out that you are hitting a critical bug in Kafka or clients are struggeling.
        kafka_broker_custom_properties:
        inter.broker.protocol.version: <CURRENT_KAFKA_VERSION>
        log.message.format.version: <CURRENT_KAFKA_VERSION>
    • Review the NEXT_CP_VERSION branch of cp-ansible for any variable changes and/or changed semantics of variables. In case a variable has changed name, you should probably duplicate the variable temporary, to be prepared for a possible emergency rollback/downgrade.
    • You probably also want to enable the serial deployment feature to avoid taking down your cluster in case the configuration is wrong.
      deployment_strategy: serial
  2. Run the desired state playbook using the NEXT_CP_VERSION branch of cp-ansible. Note: It is strongly recommended to first run your playbook in Ansible check mode, and preferably with diff mode enabled. The latter will allow you to verify that your configuration is picked up correctly and display (most of) the actual changes that is going to be performed to your cluster.
  3. After verifying your successfully upgraded cluster, you may change the inter.broker.protocol.version broker property to the newer version, and run the desired state playbook once more. We decided to postpone this, to keep an open door for a smooth possible rollback/downgrade.
  4. After confirming that your clients run fine with the new version, you may change the log.message.format.version broker property to the newer version, and run the desired state playbook once more. Note: Changing the log.message.format.version broker property to the newer version represents the point-of-no-return, so ensure that there is no need to roll back! We decided to postpone this, to keep an open door for a smooth possible rollback/downgrade.
domenicbove commented 3 years ago

Closing with the merge of 676