Open dmccue opened 8 years ago
@dmccue this is caused by the midonet-api not being colocated with the CLC/eucanetd. In 4.2 that is a requirement. We need to add a validator for this for sure.
This is what happens when the midonet-api is set to the CLC IP: https://eucalyptus.atlassian.net/secure/attachment/25720/calyptos-1441889948.tgz
midokura.midonet-api-url changed from http://10.105.10.70:8080/midonet-api to http://10.105.10.51:8080/midonet-api
[10.105.10.70] out: * execute[Create TunnelZone] action run[2015-09-10T05:55:41-07:00] INFO: Processing execute[Create TunnelZone] action run (midokura::create-first-resources line 8)
[10.105.10.70] out: [2015-09-10T05:55:41-07:00] INFO: Retrying execution of execute[Create TunnelZone], 19 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:55:52-07:00] INFO: Retrying execution of execute[Create TunnelZone], 18 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:02-07:00] INFO: Retrying execution of execute[Create TunnelZone], 17 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:12-07:00] INFO: Retrying execution of execute[Create TunnelZone], 16 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:23-07:00] INFO: Retrying execution of execute[Create TunnelZone], 15 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:33-07:00] INFO: Retrying execution of execute[Create TunnelZone], 14 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:43-07:00] INFO: Retrying execution of execute[Create TunnelZone], 13 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:54-07:00] INFO: Retrying execution of execute[Create TunnelZone], 12 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:04-07:00] INFO: Retrying execution of execute[Create TunnelZone], 11 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:14-07:00] INFO: Retrying execution of execute[Create TunnelZone], 10 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:25-07:00] INFO: Retrying execution of execute[Create TunnelZone], 9 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:35-07:00] INFO: Retrying execution of execute[Create TunnelZone], 8 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:45-07:00] INFO: Retrying execution of execute[Create TunnelZone], 7 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:56-07:00] INFO: Retrying execution of execute[Create TunnelZone], 6 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:06-07:00] INFO: Retrying execution of execute[Create TunnelZone], 5 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:16-07:00] INFO: Retrying execution of execute[Create TunnelZone], 4 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:27-07:00] INFO: Retrying execution of execute[Create TunnelZone], 3 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:37-07:00] INFO: Retrying execution of execute[Create TunnelZone], 2 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:47-07:00] INFO: Retrying execution of execute[Create TunnelZone], 1 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:58-07:00] INFO: Retrying execution of execute[Create TunnelZone], 0 attempt(s) left
Have unset midokura.midonet-api-url to default to http://localhost:8080/midonet-api which seems to have worked...
However there's now a fatal issue with cassandra:
ERROR 06:57:52,308 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296)
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:424)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296)
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:424)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
Exception encountered during startup: Unable to gossip with any seeds
Which leads to: https://stackoverflow.com/questions/20690987/apache-cassandra-unable-to-gossip-with-any-seeds
[root@odc-f-28 ~]# grep 'listen_address\|broadcast_address' /etc/cassandra/conf/cassandra.yaml
listen_address: odc-f-28.prc.eucalyptus-systems.com
# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4
# Uses public IPs as broadcast_address to allow cross-region
Reason why this doesn't work is because odc-f-28.prc.eucalyptus-systems.com is resolving to the public interface and not the private interface. What is the best way to address this, change the /etc/hosts file or modify cassandra.yaml listen_address to use the private interface IP?
This would need to be altered from node['fqdn'] to something that allows overriding to private ip address https://github.com/eucalyptus/midokura-cookbook/blob/master/recipes/cassandra.rb#L13
execute "CASSANDRA: set listening address" do
command "sed -i -e 's/localhost/#{node['fqdn']}/g' /etc/cassandra/conf/cassandra.yaml"
end
@dmccue glad to hear the localhost change fixed up the mido side.
Im going to move the cassandra issue to a different issue so we dont cross wires for this one.
I opened https://github.com/eucalyptus/calyptos/issues/72 to continue the cassandra work
Looking more at that cloud, it looks like instances are now going to running but not able to get their addresses via DHCP. Need to investigate that further.
Eucanetd is not running on the CLC which another requirement and needs a validator. After that was cleared out we had issues because eucanetd was not able to figure out which mido hosts were running the instances. This is caused from the lack of a reverse mapping of the nodes hostnames to their registered IP addresses (both in mido and euca). To work around this I added the following to the CLC/eucanetd /etc/hosts file and instances then began to get their IPs properly:
10.105.10.51 odc-f-09.prc.eucalyptus-systems.com
10.105.10.73 odc-f-31.prc.eucalyptus-systems.com
10.105.10.78 odc-f-36.prc.eucalyptus-systems.com
10.105.1.209 odc-d-30.prc.eucalyptus-systems.com
Instances are now booting and getting their IP addresses/metadata as expected.
The diff for the env file is as follows:
[root@odc-f-09 calyptos-deploy]# diff environment.yml environment-vic.yml
46,47c46
< # Mappings for only NCs and CCs
< odc-f-28.prc.eucalyptus-systems.com: 10.105.10.70
---
> # Mappings for only NCs and CLC
54a54,55
> - &EUCANETD_HOST
> odc-f-09.prc.eucalyptus-systems.com
83c84
< EucanetdHost: *MIDO_GATEWAY_HOST
---
> EucanetdHost: *EUCANETD_HOST
[root@odc-f-09 calyptos-deploy]#
Made those changes: https://eucalyptus.atlassian.net/secure/attachment/25801/calyptos-1442240272.tgz Not able to connect to the midonet-api, will investigate
https://eucalyptus.atlassian.net/secure/attachment/25802/calyptos-1442248064.tgz
(on clc) [root@odc-f-09 ~]# netstat -antp | grep 8080 tcp 0 0 :::8080 :::* LISTEN 26516/java [root@odc-f-09 ~]# midonet-cli --midonet-url=http://localhost:8080/midonet-api -A -e add tunnel-zone name mido-tz type gre The API server failed to respond normally. The network DB is possibly down. Bye. [root@odc-f-09 ~]# tail -1 /var/log/eucalyptus/eucanetd.log 2015-09-14 09:24:53 FATAL 000022010 mido_check_state | midonet-api is not reachable after 120 retries: eucanetd shutting down
Obviously the midonet-api is installed on the CLC (10.105.10.51), however the REST api is showing 404 for all calls. Likely to be a tomcat configuration issue or backend issue whereby tomcat can't communicate with zookeeper
@dmccue looks like the midonet-api is pointing at 10.105.10.70 but zookeeper is running on 10.104.10.5. Can you rerun with 10.105.10.70 as your zookeeper host. The cookbook is currently only installing zookeeper on the midonet-api host.
Sorry @dmccue i meant rerunning with 10.105.10.51
@viglesiasce That has now built with exit code 0, there remains ingress connectivity issues over private and public IPs
Validators required:
Thanks @dmccue! You saved me the work of going back through this journey to figure out the right validators :+1:
Have switched over to using midonet on clc and specifying localhost as midonet api endpoint
Successful install logs: https://eucalyptus.atlassian.net/secure/attachment/25710/calyptos-1441818819.tgz
Useful reference: http://jeevanullas.in/blog/aws-vpc-eucalyptus-midonet-2/
This is more than likely VPC related, debugging will be required to see if the configuration is set, possible missing routes as this is a non-BGP setup