cloudfoundry-incubator / admin-ui

Need new main contributor - An application for viewing Cloud Foundry metrics and operations data.
Apache License 2.0
71 stars 44 forks source link

Components offline? #142

Closed tomsherrod closed 9 years ago

tomsherrod commented 9 years ago

admin-ui is up and running. Orgs, spaces, apps, reporting as expected. App instances, no data available. (some apps are running) Routes, reporting. Service instances, service bindings, org roles, space roles, clients, users, buildpacks, domains, feature flags, quotas, stacks, events, service brokers, etc working.

DEAs are reporting offline. Cloud Controller offline. Health Manager offline. Routers running. Components: DEAs, Cloud Controller, HM9000 offline.

Stats: Reporting.

Cloud Foundry v214 I see some timeouts in the admin-ui log. Something specific I can check/look for to get the other component information?

rboykin commented 9 years ago

@tomsherrod Looks like your NATS configuration is either incorrect or inaccessible, specifically your mbus value within your config.yml

tomsherrod commented 9 years ago

@rboykin Thanks for the quick response. From the admin-ui vm, curling the url gets the Authorization Violation response. I'm using the id/password from the authorization block of the nats.conf on the nats vm. Any pointer welcome, recognizing it is not an admin-ui issue.

rboykin commented 9 years ago

@tomsherrod Check out cloud_controller_ng vm's config. Copy one of the URI's from the line for your message_bus_servers from cloud_controller_ng.yml into your admin ui config.yml's mbus property. Value something like this: nats://nats:c1oudc0w@192.168.91.176:4222

Similar value found in your dea_next configuration file dea.yml within the nats_servers block

tomsherrod commented 9 years ago

@rboykin Right on! cloud_controller_ng contained exactly what was in the default.yml. I will dig into the other yml too. I'm wondering how the CF environment is even working, however, applications are deploying and running. Strange.

rboykin commented 9 years ago

@tomsherrod Perhaps you have a firewall or something similar running which is blocking your nats port.

Here is a simple ruby nats client which dumps the vcap.components. You should be able to run this passing in the same value found in config.yml mbus field as the argument. Make sure and install the nats gem prior to running. Run as ruby

require 'nats/client'

%w(TERM INT).each do |sig| trap(sig) do NATS.stop end end

def usage puts 'Usage: dumpnats ' puts 'Example dumpnats nats://nats:nats@w3.opensmartcloud.com:4222' exit end

uri, = ARGV usage if uri.nil?

NATS.on_error do |err| puts "Server Error: #{err}" exit! end

NATS.start(uri: uri, ping_interval: 2) do NATS.request('vcap.component.discover') do |json| puts "Discover: #{json}" end end

rboykin commented 9 years ago

@tomsherrod Sorry about the indent formatting of the snippet. Github removed the indentions.

tomsherrod commented 9 years ago

Thank you. I've pocketed that for future reference. At your last comment, I cut-pasted the mbus line from the cloud_controller_ng config into the default.yml. Restarted admin-ui, now things are showing up. I visually compared the lines, exactly the same. Possible bad character on end of line or such? Learned a lot from the exchange and a dashboard. Good day indeed.

rboykin commented 9 years ago

Closing since @tomsherrod now has working

tomsherrod commented 9 years ago

@rboykin I'm setting admin ui up in another install. Components offline. I ran the dumpnats script, dumped the discover information. The script didn't end. I had to break out of it. Admin-ui.log shows timeouts to specific offline component. sample: E, [2015-09-23T11:59:57.137747 #1249] ERROR -- : [ -- ] : [ -- ] : item_result(http://192.168.2.54:43444/varz) : error [#<Errno::ETIMEDOUT: Connection timed out - connect(2) for "192.168.2.54" port 43444>] E, [2015-09-23T12:02:04.369669 #1249] ERROR -- : [ -- ] : [ -- ] : item_result(http://192.168.2.51:40929/varz) : error [#<Errno::ETIMEDOUT: Connection timed out - connect(2) for "192.168.2.51" port 40929>]

I can ping the machine from the admin-ui machine. Curling the specific port does timeout. Openstack security group, all ports of the security group to each other. admin-ui on same network as cf.

Pointers?

Tom

rboykin commented 9 years ago

@tomsherrod When you run the dumpnats script, you see results like the following for a DEA:

Discover: {"type":"DEA","index":0,"uuid":"0-fd6429251fb44d83bdf9b721c5b52e1b","host":"192.168.91.176:32868","credentials":["500c2ae6ee0640bea2add08688c3df5b","177530686fa24490b05171907f353602"],"start":"2015-09-23 07:14:24 -0500","uptime":"0d:0h:6m:29s"}

You should be able to do a curl using the credentials and host like the following using that information. I added the /varz path.

curl -u "500c2ae6ee0640bea2add08688c3df5b:177530686fa24490b05171907f353602" http://192.168.91.176:32868/varz

This is the same thing the Admin UI is doing.

Also note that each time you restart a varz-enabled component via monit, the credentials and port will change as it registers with NATS. The Admin UI is also sensitive to this and will handle.

tomsherrod commented 9 years ago

Thank you. Seems to be a timing thing and it just started working.