Tendrl / specifications

Tendrl specs go here
GNU Lesser General Public License v3.0
6 stars 16 forks source link

Added specification for expansion of cluster in tendrl post detection #254

Open shtripat opened 6 years ago

shtripat commented 6 years ago

Signed-off-by: Shubhendu shtripat@redhat.com tendrl-bug-id: Tendrl/specifications#253

shtripat commented 6 years ago

@r0h4n @nthomas-redhat @julienlim @mbukatov plz check and ack. @mbukatov I have left place holders for adding tendrl-ansible details. Olz check and comment accordingly so that same can be added.

julienlim commented 6 years ago

Related to tendrl need to support expand cluster functionality for gluster cluster.

r0h4n commented 6 years ago

Please separate the detection and expand spec

julienlim commented 6 years ago

@shtripat @mbukatov @r0h4n @nthomas-redhat

In reviewing https://github.com/Tendrl/specifications/pull/254/commits/50fca318d9620c4922047dbcf0a22fc41b54e308, here are some things to consider:

(1) what happens if tendrl-ansible fails to get the tendrl agent(s) -- i.e. node, collectd, gluster-integration -- properly installed and configured? Can we add some mention about how this is tackled, even if it's a small mention that this is all handled at the command line. [Basically a spec should account for error-handling to some degree.]

(2) can one assume all this gets logged somewhere in some tendrl log file(s) -- whether successful or failed so one can troubleshoot?

(3) The "Testing" section in the document I assume refers to the "Acceptance Criteria" should include some mention that the cluster's host list in the Tendrl UI as well as Grafana will reflect the additional nodes/hosts as well as related dashboard(s).

(4) Can we also mention if that this can be detection for expansion of 1 or more nodes and that is not limited to only 1 at a time, as the testing (or Acceptance Criteria) will need to allow for doing n number of nodes that get added to the Gluster trusted storage pool, i.e. work for 1 as well as work for multiple nodes.

(5) Acceptance Criteria should also mention if the Inventory file is the one updated or not (as I couldn't tell if we were going with the Inventory file or with something in-memory, the alternate option), but that could be because the implementation is not yet determined.

There were some minor typos but that can be easily fixed when the spec gets updated.

shtripat commented 6 years ago

@r0h4n regarding Please separate the detection and expand spec, I would start a separate specification for detection of new nodes in the gluster cluster.

@julienlim

(1) what happens if tendrl-ansible fails to get the tendrl agent(s) -- i.e. node, collectd, gluster integration -- properly installed and configured? Can we add some mention about how this is tackled, even if it's a small mention that this is all handled at the command line. [Basically a spec should account for error-handling to some degree.]

If tendrl-node-agent not found, ideally tendrl-ansible being a command line tool should report with detailed error. I would suggest a separate specification under tendrl-ansible for this. @r0h4n ?? The installation of collectd is as dependency with tendrl-node-agent. The installation of tendrl-gluster-integration is part of import cluster flow. So when import cluster would get executed for the new node in cluster, it would be installed. If there are errors around installation of tendrl-gluster-integration, that would be logged as part of flow. I can add some details like this in specification.

(2) can one assume all this gets logged somewhere in some tendrl log file(s) -- whether successful or failed so one can troubleshoot?

Anything related to tendrl-ansible should ideally with taken care as part of separate specification under tendrl-anisble. The details while import cluster flow would be updated in this specification.

(3) The "Testing" section in the document I assume refers to the "Acceptance Criteria" should include some mention that the cluster's host list in the Tendrl UI as well as Grafana will reflect the >> additional nodes/hosts as well as related dashboard(s).

Ack. I would add this acceptance criteria

(4) Can we also mention if that this can be detection for expansion of 1 or more nodes and that is >> not limited to only 1 at a time, as the testing (or Acceptance Criteria) will need to allow for doing n >> number of nodes that get added to the Gluster trusted storage pool, i.e. work for 1 as well as work >> for multiple nodes.

Ack. I will add this acceptance criteria.

(5) Acceptance Criteria should also mention if the Inventory file is the one updated or not (as I couldn't tell if we were going with the Inventory file or with something in-memory, the alternate option), but that could be because the implementation is not yet determined.

Ack. At the moment we would go ahead with inventory file and manual execution of tendrl-ansible from command line. So its like admin first modifies the inventory file for tendrl-ansible with entries for additional nodes and then runs it. This installs tendrl-node-agent on the new nodes and configures them to report to the existing tendrl server. Once this step gets over, automatic detection of nodes in tendrl server and then running import cluster flow for these additional nodes would be taken care by tendrl-server's node-agent service. I will add an acceptance criteria around inventory file of tendrl-ansible.

julienlim commented 6 years ago

@shtripat Ack your comments. I'll assume you'll make the appropriate updates to the spec.

With regards to the following statement: "...So its like admin first modifies the inventory file for tendrl-ansible with entries for additional nodes and then runs it...."

I know if user created his/her own inventory file, probably user has to update it. However, if user is using the default /etc/ansible/hosts, should we consider adding the new entries directly there and save user the effort. This is probably a nice-to-have since we don't know yet how often users are creating his/her own inventory file or using the default Ansible hosts file.

shtripat commented 6 years ago

@julienlim agree. At this moment we expect user to manually change the inventory file and run tendrl-ansible.

shtripat commented 6 years ago

@julienlim @r0h4n added another issue https://github.com/Tendrl/specifications/issues/257 for specifying the detection part of new nodes in gluster cluster. @r0h4n as discussed please take care of this part of specification under a separate spec.

shtripat commented 6 years ago

Updated the PR with more expected behavior details.

a2batic commented 6 years ago

As per suggestions from @shtripat @r0h4n, the expansion of cluster will take place when cluster is in managed state and can be indicated as follows:

@julienlim @mcarrano @nthomas-redhat , what are you thoughts? Please update the design for the same.

julienlim commented 6 years ago

@a2batic @shtripat @nthomas-redhat @r0h4n @Tendrl/qe

Thoughts?

@a2batic - We'll look into making design updates.

shtripat commented 6 years ago

@julienlim @a2batic @gnehapk During expansion for the cluster, we should show the message as Expanding with a View details link to show the job details.

Also the kebab menu options like Enable/Disable profiling and Unmanage should be disabled for the cluster.

In case there is a failure in expansion, there would be event logged from backend. Also the View Details link should remain and link to task failed details. In case of success the job link vanishes and message says ready to use as usual for the cluster.

Thoughts??

shtripat commented 6 years ago

Also while expansion of cluster going on the clutsrer-id should not be clickable and preferably a different icon for cluster status required. @julienlim ?

ltrilety commented 6 years ago

If I read the specification correctly tendrl notice a new node automatically and via some notification requires to run tendrl-ansible, right? Has anyone thought about the opposite scenario, when a node is removed from gluster?

shtripat commented 6 years ago

Lubos, so this spec talks about only expansion scenario and shrink part is not in the scope of this.

mcarrano commented 6 years ago

@a2batic @julienlim @gnehapk @shtripat We've added a new wireframe to reflect what this should look like in the UI while the expansion is taking place. This approach can apply to expansion or any similar operation. See https://redhat.invisionapp.com/share/8QCOEVEY9#/282443505_Clusters-Update_In_Progress

In summary: *The status icon should change to 'pficon-in-progress'

Let me know if you have any questions.

gnehapk commented 6 years ago

@mbukatov @julienlim Is the view detail action(click on cluster name to view cluster detail) also allowed for expanding cluster? As per the mockup, the cluster name is hyperlinked.

julienlim commented 6 years ago

@gnehapk @a2batic @shtripat @nthomas-redhat @r0h4n @Tendrl/qe @mcarrano

Design updates have been posted at https://redhat.invisionapp.com/share/8QCOEVEY9. See https://github.com/Tendrl/commons/issues/849#issuecomment-371000495 for further details.

@gnehapk During expansion, details are still viewable. For views where there may be out-of-synch data, we show an inline notification -- see https://redhat.invisionapp.com/share/8QCOEVEY9#/screens/283595310 example.