conjure-up / conjure-up

Deploying complex solutions, magically.
https://conjure-up.io
MIT License
452 stars 73 forks source link

nova-cloud-controller/0: hook failed #487

Closed qgriffith-zz closed 8 years ago

qgriffith-zz commented 8 years ago

Ubuntu 16.04 using MAAS Version 2.0.0+bzr5189-0ubuntu1 (16.04.1) JUJU: 2.0-rc3-0ubuntu1~16.04.1~juju1 conjure: 2.0.1-0~201610061938~ubuntu16.04.

I get the following in my log when doing a conjure up openstack and using the MAAS option

conjure-up/_unspecified_spell: [ERROR] conjure-up/_unspecified_spell: Failure in deploy done: Deployment errors:
nova-cloud-controller/0: hook failed: "cloud-compute-relation-changed" for nova-compute:cloud-compute
conjure-up/_unspecified_spell: [ERROR] conjure-up/_unspecified_spell: Showing dialog for exception:
Traceback (most recent call last):
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3/dist-packages/conjureup/controllers/deploystatus/common.py", line 40, in wait_for_applications
    raise Exception(result['message'])
Exception: Deployment errors:
nova-cloud-controller/0: hook failed: "cloud-compute-relation-changed" for nova-compute:cloud-compute
mikemccracken commented 8 years ago

@qgriffith Thanks for trying this out, sorry it didn't work smoothy. This is an error from the juju charm that deploys nova on the control node. To debug this further, you'll need to look at juju debug output directly.

looking at juju status should show you the 'cloud-compute-relation-changed' error in the nova-cloud-controller application, and may show other errors that might be good clues, such as errors from any of the machines.

juju debug-log --replay -i unit-nova-cloud-controller-0 will show you all the debug log output from the erroring application. Something might be useful there. Feel free to paste the results here if it's not obvious what's going on and we can try to help.

Good luck

qgriffith-zz commented 8 years ago

Thank you for your super quick reply and helpful tips. It appears to be something with DNS maybe

unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed Traceback (most recent call last):
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1102, in <module>
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     main()
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1096, in main
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     hooks.execute(sys.argv)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/core/hookenv.py", line 715, in execute
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     self._hooks[hook_name]()
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 618, in compute_changed
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     ssh_compute_add(key, rid=rid, unit=unit)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/nova_cc_utils.py", line 746, in ssh_compute_add
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     hn = get_hostname(private_address)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 465, in get_hostname
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     result = ns_query(rev)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 427, in ns_query
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     answers = dns.resolver.query(address, rtype)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 981, in query
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise_on_no_answer, source_port)
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 910, in query
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise NXDOMAIN
unit-nova-cloud-controller-0: 14:11:24 INFO unit.nova-cloud-controller/0.cloud-compute-relation-changed dns.resolver.NXDOMAIN
unit-nova-cloud-controller-0: 14:11:24 ERROR juju.worker.uniter.operation hook "cloud-compute-relation-changed" failed: exit status 1
unit-nova-cloud-controller-0: 14:11:24 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
qgriffith-zz commented 8 years ago

there was also this error at the start but it seems to continue past it

unit-nova-cloud-controller-0: 14:03:01 ERROR juju.worker.dependency "metric-collect" manifold worker returned unexpected error: failed to read charm from: /var/lib/juju/agents/unit-nova-cloud-controller-0/charm: stat /var/lib/juju/agents/unit-nova-cloud-controller-0/charm: no such file or directory

mikemccracken commented 8 years ago

This doesn't ring any bells immediately, we're looking into it.

qgriffith-zz commented 8 years ago

I do notice that on the blades MAAS manages DNS is setup like this

nameserver search maas

In the previous version of MAAS DNS was setup to point to the MAAS server and then forward do the real DNS server. I don't know if that is the issue but something I am looking into.

adam-stokes commented 8 years ago

Not sure if this helps but this is my network setup for MAAS

maas-network

qgriffith-zz commented 8 years ago

I think I have fixed this by removing the DNS server from the subnet MAAS handles. This will now default all MAAS servers to use MAAS as the DNS server, which then forwards out to my real DNS server

laralar commented 6 years ago

Report

Thank you for trying conjure-up! Before reporting a bug please make sure you've gone through this checklist:

Please provide the output of the following commands

which juju
juju version

which conjure-up
conjure-up --version

which lxc
/snap/bin/lxc config show
/snap/bin/lxc version

cat /etc/lsb-release

Please attach tarball of ~/.cache/conjure-up:

tar cvzf conjure-up.tar.gz ~/.cache/conjure-up

Sosreport

Please attach a sosreport:

sudo apt install sosreport
sosreport

The resulting output file can be attached to this issue.

What Spell was Selected?

What provider (aws, maas, localhost, etc)?

MAAS Users

Which version of MAAS?

Commands ran

Please outline what commands were run to install and execute conjure-up:

Additional Information

I am experiencing similar issue as https://github.com/conjure-up/conjure-up/issues/487

Hello. I am experiencing a similar issue

aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/2 ping node16
ping: node16: Temporary failure in name resolution
Connection to 10.3.5.44 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/1 ping node16
ping: node16: Temporary failure in name resolution
Connection to 10.3.5.45 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/0 ping node16
ping: node16: Temporary failure in name resolution
Connection to 10.3.5.43 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/0 ping node16.aibl.lan
PING node16.aibl.lan (10.3.4.16) 56(84) bytes of data.
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=1 ttl=64 time=0.273 ms
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=2 ttl=64 time=0.176 ms
^C
--- node16.aibl.lan ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.176/0.224/0.273/0.050 ms
Connection to 10.3.5.43 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/1 ping node16.aibl.lan
PING node16.aibl.lan (10.3.4.16) 56(84) bytes of data.
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=1 ttl=64 time=0.228 ms
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=2 ttl=64 time=0.140 ms
^C
--- node16.aibl.lan ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1014ms
rtt min/avg/max/mdev = 0.140/0.184/0.228/0.044 ms
Connection to 10.3.5.45 closed.
aibladmin@os-client:~/.cache/conjure-up$ /snap/bin/juju ssh 2/lxd/2 ping node16.aibl.lan
PING node16.aibl.lan (10.3.4.16) 56(84) bytes of data.
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=1 ttl=64 time=0.181 ms
64 bytes from node16.aibl.lan (10.3.4.16): icmp_seq=2 ttl=64 time=0.206 ms
^C
--- node16.aibl.lan ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.181/0.193/0.206/0.018 ms
Connection to 10.3.5.44 closed.
aibladmin@os-client:~/.cache/conjure-up$
nova-cloud-controller/0*  error     idle       2/lxd/2  10.3.5.44       8774/tcp,8778/tcp  hook failed: "cloud-compute-relation-changed"
unit-nova-cloud-controller-0: 13:48:05 INFO unit.nova-cloud-controller/0.juju-log cloud-compute:27: Generating template context for cell v2 share-db
unit-nova-cloud-controller-0: 13:48:05 INFO unit.nova-cloud-controller/0.juju-log cloud-compute:27: Missing required data: novacell0_password novaapi_password nova_password
unit-nova-cloud-controller-0: 13:48:05 DEBUG unit.nova-cloud-controller/0.juju-log cloud-compute:27: OpenStack release, database, or rabbitmq not ready for Cells V2
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed Traceback (most recent call last):
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1183, in <module>
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     main()
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 1176, in main
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     hooks.execute(sys.argv)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/core/hookenv.py", line 823, in execute
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     self._hooks[hook_name]()
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/cloud-compute-relation-changed", line 671, in compute_changed
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     ssh_compute_add(key, rid=rid, unit=unit)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/nova_cc_utils.py", line 1005, in ssh_compute_add
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     if ns_query(short):
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/charmhelpers/contrib/network/ip.py", line 478, in ns_query
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     answers = dns.resolver.query(address, rtype)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 1132, in query
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise_on_no_answer, source_port)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed   File "/usr/lib/python2.7/dist-packages/dns/resolver.py", line 947, in query
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed     raise NoNameservers(request=request, errors=errors)
unit-nova-cloud-controller-0: 13:48:06 DEBUG unit.nova-cloud-controller/0.cloud-compute-relation-changed dns.resolver.NoNameservers: All nameservers failed to answer the query node16. IN A: Server 127.0.0.53 UDP port 53 answered SERVFAIL
unit-nova-cloud-controller-0: 13:48:06 ERROR juju.worker.uniter.operation hook "cloud-compute-relation-changed" failed: exit status 1
unit-nova-cloud-controller-0: 13:48:06 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook

I have no issues outside juju containers to resolve the names. it seems the dns-search is failing

The problem that I see is that if I remove the DNS from the MAAS subnet, DHCP will not assign the MAAS DNS and will Any clues?

Outside the juju containers, I can resolve with or without the DNS name without any issue. I can also resovle the juju-3c505c-2-lxd-2 with or without the domain name, but not inside the container

What I see is that I have assigned static IP addresses to the nodes (e.g. node16) and if I ssh to that node, I can't resolve the node name

The dns-search is missing from the netplan configuration ubuntu@node17:~$ ping node16 ping: node16: Temporary failure in name resolution ubuntu@node17:~$

/Also.. It doesn't seem to be MAAS/DNS related issue. it seems that the juju containers dont have the dsn-search property?

ubuntu@juju-41f038-3-lxd-2:~$ ping juju-41f038-1-lxd-0
ping: juju-41f038-1-lxd-0: Temporary failure in name resolution
ubuntu@juju-41f038-3-lxd-2:~$ ping juju-41f038-1-lxd-1
ping: juju-41f038-1-lxd-1: Temporary failure in name resolution
ubuntu@juju-41f038-3-lxd-2:~$ ping juju-41f038-1-lxd-1.aibl.lan
PING juju-41f038-1-lxd-1.aibl.lan (10.3.5.55) 56(84) bytes of data.
64 bytes from juju-41f038-1-lxd-1.aibl.lan (10.3.5.55): icmp_seq=1 ttl=64 time=0.227 ms
64 bytes from juju-41f038-1-lxd-1.aibl.lan (10.3.5.55): icmp_seq=2 ttl=64 time=0.148 ms
^C
--- juju-41f038-1-lxd-1.aibl.lan ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1006ms
rtt min/avg/max/mdev = 0.148/0.187/0.227/0.041 ms
ubuntu@juju-41f038-3-lxd-2:~$
tux-box commented 5 years ago

I have this same problem. When I checked the resolv.conf I found that "options edns0" while ifconfig shows eth0. Not sure if this is related or not.