Tendrl / ui

A repository for the front-end artifacts of Tendrl UI
GNU Lesser General Public License v2.1
6 stars 19 forks source link

Import failing but UI says Cluster is "Ready to use" #995

Closed julienlim closed 6 years ago

julienlim commented 6 years ago

Problem: Import fails but UI says Cluster is "Ready to use"

Environment: Installed using tendrl-vagrant on MacOS:

Using tendrl-1.6.3-20180615

[vagrant@tendrl-server ~]$ rpm -qa | grep tendrl | sort

tendrl-ansible-1.6.3-20180615T095226.268f8a2.noarch
tendrl-api-1.6.3-20180530T164022.8308f00.noarch
tendrl-api-httpd-1.6.3-20180530T164022.8308f00.noarch
tendrl-commons-1.6.3-20180615T125547.069c634.noarch
tendrl-grafana-plugins-1.6.3-20180615T120423.a75aca4.noarch
tendrl-grafana-selinux-1.5.4-20180227T085901.984600c.noarch
tendrl-monitoring-integration-1.6.3-20180615T120423.a75aca4.noarch
tendrl-node-agent-1.6.3-20180615T125550.2642567.noarch
tendrl-notifier-1.6.3-20180614T113218.d4353f2.noarch
tendrl-selinux-1.5.4-20180227T085901.984600c.noarch
tendrl-ui-1.6.3-20180615T112029.1d0ad59.noarch

Observations

Excerpt of "journalctl -u tendrl-monitoring-integration": see https://paste.fedoraproject.org/paste/JFnrkiQxtaxrhj1~dE9BEw

Potentially related to the following:

Per @mbukatov: "If the monitoring integration can get into state that it crashes, is restarted and it's not up again, it's either:

Note: I've deployed twice now with tendrl-vagrant and get this same thing to happen twice now.

Some screenshots:

Failure in Import Cluster Task:

screen shot 2018-06-15 at 7 51 27 pm

Despite failure, UI shows Cluster is Ready to Use:

screen shot 2018-06-15 at 7 54 14 pm

Dashboard not found:

screen shot 2018-06-15 at 7 52 58 pm

@nthomas-redhat @r0h4n @shirshendu @gnehapk @Tendrl/qe @shtripat @GowthamShanmugam @anmolsachan

julienlim commented 6 years ago

Some additional information related to the tendrl-monitoring-integration:

[root@tendrl-server tendrl-ansible-1.6.3]# systemctl status tendrl-monitoring-integration

● tendrl-monitoring-integration.service - Monitoring Integration
   Loaded: loaded (/usr/lib/systemd/system/tendrl-monitoring-integration.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2018-06-15 23:36:39 UTC; 2 days ago
     Docs: https://github.com/Tendrl/monitoring-integration/tree/master/doc/source
  Process: 25512 ExecStart=/usr/bin/tendrl-monitoring-integration (code=exited, status=1/FAILURE)
 Main PID: 25512 (code=exited, status=1/FAILURE)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

Trying to start tendrl-monitoring-integration

[root@tendrl-server tendrl-ansible-1.6.3]# systemctl start tendrl-monitoring-integration
[root@tendrl-server tendrl-ansible-1.6.3]# systemctl status tendrl-monitoring-integration
● tendrl-monitoring-integration.service - Monitoring Integration
   Loaded: loaded (/usr/lib/systemd/system/tendrl-monitoring-integration.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2018-06-18 02:04:57 UTC; 2s ago
     Docs: https://github.com/Tendrl/monitoring-integration/tree/master/doc/source
 Main PID: 23114 (tendrl-monitori)
   CGroup: /system.slice/tendrl-monitoring-integration.service
           └─23114 /usr/bin/python /usr/bin/tendrl-monitoring-integration

Jun 18 02:04:57 tendrl-server systemd[1]: Started Monitoring Integration.
Jun 18 02:04:57 tendrl-server systemd[1]: Starting Monitoring Integration…

[root@tendrl-server tendrl-ansible-1.6.3]# systemctl status tendrl-monitoring-integration
● tendrl-monitoring-integration.service - Monitoring Integration
   Loaded: loaded (/usr/lib/systemd/system/tendrl-monitoring-integration.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2018-06-18 02:05:07 UTC; 100ms ago
     Docs: https://github.com/Tendrl/monitoring-integration/tree/master/doc/source
  Process: 23147 ExecStart=/usr/bin/tendrl-monitoring-integration (code=exited, status=1/FAILURE)
 Main PID: 23147 (code=exited, status=1/FAILURE)

Jun 18 02:05:07 tendrl-server systemd[1]: tendrl-monitoring-integration.service: main process exited, code=exited, status=1/FAILURE
Jun 18 02:05:07 tendrl-server systemd[1]: Unit tendrl-monitoring-integration.service entered failed state.
Jun 18 02:05:07 tendrl-server systemd[1]: tendrl-monitoring-integration.service failed.
gnehapk commented 6 years ago

@julienlim Can you please share the API response of /clusters received on cluster list view.

mbukatov commented 6 years ago

Reposting most interesting part from journalctl -u tendrl-monitoring-integration output, as provided in https://paste.fedoraproject.org/paste/JFnrkiQxtaxrhj1~dE9BEw so that it's possible to find this issue by searching for the error:

Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: Load definitions (.yml) for namespace.tendrl.objects.TendrlContext
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: Traceback (most recent call last):
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/bin/tendrl-monitoring-integration", line 9, in <module>
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: load_entry_point('tendrl-monitoring-integration==1.6.3', 'console_scripts', 'tendrl-monitoring-integration')()
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/manager/__init__.py", line 71, in main
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: monitoring_integration_manager.start()
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/manager/__init__.py", line 31, in start
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: dashboard.upload_default_dashboards()
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/grafana/dashboard.py", line 27, in upload_default_dashboards
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: raise ex
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: KeyError: 'id'
Jun 18 02:05:05 tendrl-server systemd[1]: tendrl-monitoring-integration.service: main process exited, code=exited, status=1/FAILURE
Jun 18 02:05:05 tendrl-server systemd[1]: Unit tendrl-monitoring-integration.service entered failed state.
Jun 18 02:05:05 tendrl-server systemd[1]: tendrl-monitoring-integration.service failed.
GowthamShanmugam commented 6 years ago

Hi Martin, This will happen when password mismatch happens in monitoring-integration configuration file. I have already sent steps to change a password to Julim. Let Julim try the steps and we will see whether this issue is resolving or not.

I faced the same issue when I gave a wrong grafana password in monitoring integration configuration file. I feel this should be the same issue. This happened because of some problem while configuring monitoring-integration and grafana.

For the second one cluster ready to use after import job fails, I have raised the downstream issue and fixed the problem, PR is under review in upstream. https://bugzilla.redhat.com/show_bug.cgi?id=1593640

Thanks & Regards Gowtham S

On Thu, Jun 21, 2018 at 6:53 PM, Martin Bukatovič notifications@github.com wrote:

Reposting most interesting part from journalctl -u tendrl-monitoring-integration output, as provided in https://paste.fedoraproject.org/paste/JFnrkiQxtaxrhj1~dE9BEw so that it's possible to find this issue by searching for the error:

Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: Load definitions (.yml) for namespace.tendrl.objects.TendrlContext Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: Traceback (most recent call last): Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/bin/tendrl-monitoring-integration", line 9, in Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: load_entry_point('tendrl-monitoring-integration==1.6.3', 'console_scripts', 'tendrl-monitoring-integration')() Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/manager/init.py", line 71, in main Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: monitoring_integration_manager.start() Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/manager/init.py", line 31, in start Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: dashboard.upload_default_dashboards() Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/grafana/dashboard.py", line 27, in upload_default_dashboards Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: raise ex Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: KeyError: 'id' Jun 18 02:05:05 tendrl-server systemd[1]: tendrl-monitoring-integration.service: main process exited, code=exited, status=1/FAILURE Jun 18 02:05:05 tendrl-server systemd[1]: Unit tendrl-monitoring-integration.service entered failed state. Jun 18 02:05:05 tendrl-server systemd[1]: tendrl-monitoring-integration.service failed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Tendrl/ui/issues/995#issuecomment-399101722, or mute the thread https://github.com/notifications/unsubscribe-auth/AOYBNhGdlP1ZCn_IyrEa_4S_aBH5ksPUks5t-55egaJpZM4Uv6TT .

mbukatov commented 6 years ago

Thanks for the quick review.

I faced the same issue when I gave a wrong grafana password in monitoring integration configuration file.

If that is the case, isn't there a bug in vagrant script as well? If I read you right, this is the password which is configured by tendrl-ansible, and then stored in local password file, so that when tendrl-ansible is run again, the same password is used without breaking the current setup.

julienlim commented 6 years ago

@GowthamShanmugam @mbukatov I'm experience this issue twice: (1) grafana password not set (2) grafana password set correctly

The symptom being observed is that the tendrl-monitoring-integration agent does not want to stay up despite restarting it.

julienlim commented 6 years ago

@GowthamShanmugam @mbukatov I just redeployed and this time when it fails the import, it gives the correct message, i.e. Import Failed.

$ rpm -qa | grep tendrl | sort
tendrl-api-1.6.3-20180626T110501.5a1c79e.noarch
tendrl-api-httpd-1.6.3-20180626T110501.5a1c79e.noarch
tendrl-commons-1.6.3-20180628T114340.d094568.noarch
tendrl-grafana-plugins-1.6.3-20180622T070617.1f84bc8.noarch
tendrl-grafana-selinux-1.5.4-20180227T085901.984600c.noarch
tendrl-monitoring-integration-1.6.3-20180622T070617.1f84bc8.noarch
tendrl-node-agent-1.6.3-20180618T083110.ba580e6.noarch
tendrl-notifier-1.6.3-20180618T083117.fd7bddb.noarch
tendrl-selinux-1.5.4-20180227T085901.984600c.noarch
tendrl-ui-1.6.3-20180625T085228.23f862a.noarch
screen shot 2018-07-02 at 2 53 00 pm screen shot 2018-07-02 at 2 52 49 pm

Side note: I verified that the grafana admin password is properly set, and the issue of the tendrl-monitoring-integration agent not want to stay up still persists (so import is not able to actually complete successfully).

gnehapk commented 6 years ago

@GowthamShanmugam can you please close this issue if fixed?

GowthamShanmugam commented 6 years ago

This issue is fixed https://github.com/Tendrl/gluster-integration/pull/691, it is happened because of race condition in cluster object save.

@gnehapk we can close this issue