Closed julienlim closed 6 years ago
Some additional information related to the tendrl-monitoring-integration:
[root@tendrl-server tendrl-ansible-1.6.3]# systemctl status tendrl-monitoring-integration
● tendrl-monitoring-integration.service - Monitoring Integration
Loaded: loaded (/usr/lib/systemd/system/tendrl-monitoring-integration.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Fri 2018-06-15 23:36:39 UTC; 2 days ago
Docs: https://github.com/Tendrl/monitoring-integration/tree/master/doc/source
Process: 25512 ExecStart=/usr/bin/tendrl-monitoring-integration (code=exited, status=1/FAILURE)
Main PID: 25512 (code=exited, status=1/FAILURE)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
Trying to start tendrl-monitoring-integration
[root@tendrl-server tendrl-ansible-1.6.3]# systemctl start tendrl-monitoring-integration
[root@tendrl-server tendrl-ansible-1.6.3]# systemctl status tendrl-monitoring-integration
● tendrl-monitoring-integration.service - Monitoring Integration
Loaded: loaded (/usr/lib/systemd/system/tendrl-monitoring-integration.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2018-06-18 02:04:57 UTC; 2s ago
Docs: https://github.com/Tendrl/monitoring-integration/tree/master/doc/source
Main PID: 23114 (tendrl-monitori)
CGroup: /system.slice/tendrl-monitoring-integration.service
└─23114 /usr/bin/python /usr/bin/tendrl-monitoring-integration
Jun 18 02:04:57 tendrl-server systemd[1]: Started Monitoring Integration.
Jun 18 02:04:57 tendrl-server systemd[1]: Starting Monitoring Integration…
[root@tendrl-server tendrl-ansible-1.6.3]# systemctl status tendrl-monitoring-integration
● tendrl-monitoring-integration.service - Monitoring Integration
Loaded: loaded (/usr/lib/systemd/system/tendrl-monitoring-integration.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2018-06-18 02:05:07 UTC; 100ms ago
Docs: https://github.com/Tendrl/monitoring-integration/tree/master/doc/source
Process: 23147 ExecStart=/usr/bin/tendrl-monitoring-integration (code=exited, status=1/FAILURE)
Main PID: 23147 (code=exited, status=1/FAILURE)
Jun 18 02:05:07 tendrl-server systemd[1]: tendrl-monitoring-integration.service: main process exited, code=exited, status=1/FAILURE
Jun 18 02:05:07 tendrl-server systemd[1]: Unit tendrl-monitoring-integration.service entered failed state.
Jun 18 02:05:07 tendrl-server systemd[1]: tendrl-monitoring-integration.service failed.
@julienlim Can you please share the API response of /clusters
received on cluster list view.
Reposting most interesting part from journalctl -u tendrl-monitoring-integration
output, as provided in https://paste.fedoraproject.org/paste/JFnrkiQxtaxrhj1~dE9BEw so that it's possible to find this issue by searching for the error:
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: Load definitions (.yml) for namespace.tendrl.objects.TendrlContext
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: Traceback (most recent call last):
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/bin/tendrl-monitoring-integration", line 9, in <module>
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: load_entry_point('tendrl-monitoring-integration==1.6.3', 'console_scripts', 'tendrl-monitoring-integration')()
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/manager/__init__.py", line 71, in main
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: monitoring_integration_manager.start()
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/manager/__init__.py", line 31, in start
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: dashboard.upload_default_dashboards()
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/grafana/dashboard.py", line 27, in upload_default_dashboards
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: raise ex
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: KeyError: 'id'
Jun 18 02:05:05 tendrl-server systemd[1]: tendrl-monitoring-integration.service: main process exited, code=exited, status=1/FAILURE
Jun 18 02:05:05 tendrl-server systemd[1]: Unit tendrl-monitoring-integration.service entered failed state.
Jun 18 02:05:05 tendrl-server systemd[1]: tendrl-monitoring-integration.service failed.
Hi Martin, This will happen when password mismatch happens in monitoring-integration configuration file. I have already sent steps to change a password to Julim. Let Julim try the steps and we will see whether this issue is resolving or not.
I faced the same issue when I gave a wrong grafana password in monitoring integration configuration file. I feel this should be the same issue. This happened because of some problem while configuring monitoring-integration and grafana.
For the second one cluster ready to use after import job fails, I have raised the downstream issue and fixed the problem, PR is under review in upstream. https://bugzilla.redhat.com/show_bug.cgi?id=1593640
Thanks & Regards Gowtham S
On Thu, Jun 21, 2018 at 6:53 PM, Martin Bukatovič notifications@github.com wrote:
Reposting most interesting part from journalctl -u tendrl-monitoring-integration output, as provided in https://paste.fedoraproject.org/paste/JFnrkiQxtaxrhj1~dE9BEw so that it's possible to find this issue by searching for the error:
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: Load definitions (.yml) for namespace.tendrl.objects.TendrlContext Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: Traceback (most recent call last): Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/bin/tendrl-monitoring-integration", line 9, in
Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: load_entry_point('tendrl-monitoring-integration==1.6.3', 'console_scripts', 'tendrl-monitoring-integration')() Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/manager/init.py", line 71, in main Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: monitoring_integration_manager.start() Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/manager/init.py", line 31, in start Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: dashboard.upload_default_dashboards() Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: File "/usr/lib/python2.7/site-packages/tendrl/monitoring_integration/grafana/dashboard.py", line 27, in upload_default_dashboards Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: raise ex Jun 18 02:05:05 tendrl-server tendrl-monitoring-integration[23137]: KeyError: 'id' Jun 18 02:05:05 tendrl-server systemd[1]: tendrl-monitoring-integration.service: main process exited, code=exited, status=1/FAILURE Jun 18 02:05:05 tendrl-server systemd[1]: Unit tendrl-monitoring-integration.service entered failed state. Jun 18 02:05:05 tendrl-server systemd[1]: tendrl-monitoring-integration.service failed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Tendrl/ui/issues/995#issuecomment-399101722, or mute the thread https://github.com/notifications/unsubscribe-auth/AOYBNhGdlP1ZCn_IyrEa_4S_aBH5ksPUks5t-55egaJpZM4Uv6TT .
Thanks for the quick review.
I faced the same issue when I gave a wrong grafana password in monitoring integration configuration file.
If that is the case, isn't there a bug in vagrant script as well? If I read you right, this is the password which is configured by tendrl-ansible, and then stored in local password file, so that when tendrl-ansible is run again, the same password is used without breaking the current setup.
@GowthamShanmugam @mbukatov I'm experience this issue twice: (1) grafana password not set (2) grafana password set correctly
The symptom being observed is that the tendrl-monitoring-integration agent does not want to stay up despite restarting it.
@GowthamShanmugam @mbukatov I just redeployed and this time when it fails the import, it gives the correct message, i.e. Import Failed.
$ rpm -qa | grep tendrl | sort
tendrl-api-1.6.3-20180626T110501.5a1c79e.noarch
tendrl-api-httpd-1.6.3-20180626T110501.5a1c79e.noarch
tendrl-commons-1.6.3-20180628T114340.d094568.noarch
tendrl-grafana-plugins-1.6.3-20180622T070617.1f84bc8.noarch
tendrl-grafana-selinux-1.5.4-20180227T085901.984600c.noarch
tendrl-monitoring-integration-1.6.3-20180622T070617.1f84bc8.noarch
tendrl-node-agent-1.6.3-20180618T083110.ba580e6.noarch
tendrl-notifier-1.6.3-20180618T083117.fd7bddb.noarch
tendrl-selinux-1.5.4-20180227T085901.984600c.noarch
tendrl-ui-1.6.3-20180625T085228.23f862a.noarch
Side note: I verified that the grafana admin password is properly set, and the issue of the tendrl-monitoring-integration agent not want to stay up still persists (so import is not able to actually complete successfully).
@GowthamShanmugam can you please close this issue if fixed?
This issue is fixed https://github.com/Tendrl/gluster-integration/pull/691, it is happened because of race condition in cluster object save.
@gnehapk we can close this issue
Problem: Import fails but UI says Cluster is "Ready to use"
Environment: Installed using tendrl-vagrant on MacOS:
Using tendrl-1.6.3-20180615
Observations
Excerpt of "journalctl -u tendrl-monitoring-integration": see https://paste.fedoraproject.org/paste/JFnrkiQxtaxrhj1~dE9BEw
Potentially related to the following:
Per @mbukatov: "If the monitoring integration can get into state that it crashes, is restarted and it's not up again, it's either:
Note: I've deployed twice now with tendrl-vagrant and get this same thing to happen twice now.
Some screenshots:
Failure in Import Cluster Task:
Despite failure, UI shows Cluster is Ready to Use:
Dashboard not found:
@nthomas-redhat @r0h4n @shirshendu @gnehapk @Tendrl/qe @shtripat @GowthamShanmugam @anmolsachan