Calyptos: Running midonet-api on non-CLC hosts should fail validation

dmccue commented 8 years ago

Successful install logs: https://eucalyptus.atlassian.net/secure/attachment/25710/calyptos-1441818819.tgz

[root@odc-f-09 ~]# euca-describe-instances i-b6698673
RESERVATION r-689b39b6  000251786737    default
INSTANCE    i-b6698673  emi-990a5431    10.116.156.1    172.31.1.94 pending admin   0       m1.medium   2015-09-09T17:51:29.152Z    az-01               monitoring-enabled  10.116.156.1    172.31.1.94 vpc-7e0cb490    subnet-7aea6bd4 instance-store                  hvm         sg-91fc5339             x86_64
NETWORKINTERFACE    eni-33419845    subnet-7aea6bd4 vpc-7e0cb490    000251786737    in-use  172.31.1.94 euca-172-31-1-94.eucalyptus.internal    true
ATTACHMENT      0   attached    2015-09-09T17:51:29.157Z    true
ASSOCIATION 10.116.156.1        172.31.1.94
GROUP   sg-91fc5339 default
PRIVATEIPADDRESS    172.31.1.94 euca-172-31-1-94.eucalyptus.internal    primary
TAG instance    i-b6698673  Name    test1
TAG instance    i-b6698673  euca:node   10.105.1.209

Useful reference: http://jeevanullas.in/blog/aws-vpc-eucalyptus-midonet-2/

This is more than likely VPC related, debugging will be required to see if the configuration is set, possible missing routes as this is a non-BGP setup

viglesiasce commented 8 years ago

@dmccue this is caused by the midonet-api not being colocated with the CLC/eucanetd. In 4.2 that is a requirement. We need to add a validator for this for sure.

dmccue commented 8 years ago

This is what happens when the midonet-api is set to the CLC IP: https://eucalyptus.atlassian.net/secure/attachment/25720/calyptos-1441889948.tgz

midokura.midonet-api-url changed from http://10.105.10.70:8080/midonet-api to http://10.105.10.51:8080/midonet-api

[10.105.10.70] out:   * execute[Create TunnelZone] action run[2015-09-10T05:55:41-07:00] INFO: Processing execute[Create TunnelZone] action run (midokura::create-first-resources line 8)
[10.105.10.70] out: [2015-09-10T05:55:41-07:00] INFO: Retrying execution of execute[Create TunnelZone], 19 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:55:52-07:00] INFO: Retrying execution of execute[Create TunnelZone], 18 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:02-07:00] INFO: Retrying execution of execute[Create TunnelZone], 17 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:12-07:00] INFO: Retrying execution of execute[Create TunnelZone], 16 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:23-07:00] INFO: Retrying execution of execute[Create TunnelZone], 15 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:33-07:00] INFO: Retrying execution of execute[Create TunnelZone], 14 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:43-07:00] INFO: Retrying execution of execute[Create TunnelZone], 13 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:56:54-07:00] INFO: Retrying execution of execute[Create TunnelZone], 12 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:04-07:00] INFO: Retrying execution of execute[Create TunnelZone], 11 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:14-07:00] INFO: Retrying execution of execute[Create TunnelZone], 10 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:25-07:00] INFO: Retrying execution of execute[Create TunnelZone], 9 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:35-07:00] INFO: Retrying execution of execute[Create TunnelZone], 8 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:45-07:00] INFO: Retrying execution of execute[Create TunnelZone], 7 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:57:56-07:00] INFO: Retrying execution of execute[Create TunnelZone], 6 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:06-07:00] INFO: Retrying execution of execute[Create TunnelZone], 5 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:16-07:00] INFO: Retrying execution of execute[Create TunnelZone], 4 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:27-07:00] INFO: Retrying execution of execute[Create TunnelZone], 3 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:37-07:00] INFO: Retrying execution of execute[Create TunnelZone], 2 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:47-07:00] INFO: Retrying execution of execute[Create TunnelZone], 1 attempt(s) left
[10.105.10.70] out: [2015-09-10T05:58:58-07:00] INFO: Retrying execution of execute[Create TunnelZone], 0 attempt(s) left

dmccue commented 8 years ago

Have unset midokura.midonet-api-url to default to http://localhost:8080/midonet-api which seems to have worked...

However there's now a fatal issue with cassandra:

ERROR 06:57:52,308 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
    at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296)
    at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
    at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:424)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
java.lang.RuntimeException: Unable to gossip with any seeds
    at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1296)
    at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:457)
    at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:671)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:623)
    at org.apache.cassandra.service.StorageService.initServer(StorageService.java:515)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:424)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
Exception encountered during startup: Unable to gossip with any seeds

Which leads to: https://stackoverflow.com/questions/20690987/apache-cassandra-unable-to-gossip-with-any-seeds

dmccue commented 8 years ago

[root@odc-f-28 ~]# grep 'listen_address\|broadcast_address' /etc/cassandra/conf/cassandra.yaml
listen_address: odc-f-28.prc.eucalyptus-systems.com
# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4
#    Uses public IPs as broadcast_address to allow cross-region

Reason why this doesn't work is because odc-f-28.prc.eucalyptus-systems.com is resolving to the public interface and not the private interface. What is the best way to address this, change the /etc/hosts file or modify cassandra.yaml listen_address to use the private interface IP?

This would need to be altered from node['fqdn'] to something that allows overriding to private ip address https://github.com/eucalyptus/midokura-cookbook/blob/master/recipes/cassandra.rb#L13

execute "CASSANDRA: set listening address" do
 command "sed -i -e 's/localhost/#{node['fqdn']}/g' /etc/cassandra/conf/cassandra.yaml"
end

viglesiasce commented 8 years ago

@dmccue glad to hear the localhost change fixed up the mido side.

Im going to move the cassandra issue to a different issue so we dont cross wires for this one.

viglesiasce commented 8 years ago

I opened https://github.com/eucalyptus/calyptos/issues/72 to continue the cassandra work

viglesiasce commented 8 years ago

Looking more at that cloud, it looks like instances are now going to running but not able to get their addresses via DHCP. Need to investigate that further.

viglesiasce commented 8 years ago

Eucanetd is not running on the CLC which another requirement and needs a validator. After that was cleared out we had issues because eucanetd was not able to figure out which mido hosts were running the instances. This is caused from the lack of a reverse mapping of the nodes hostnames to their registered IP addresses (both in mido and euca). To work around this I added the following to the CLC/eucanetd /etc/hosts file and instances then began to get their IPs properly:

10.105.10.51 odc-f-09.prc.eucalyptus-systems.com
10.105.10.73 odc-f-31.prc.eucalyptus-systems.com
10.105.10.78 odc-f-36.prc.eucalyptus-systems.com
10.105.1.209 odc-d-30.prc.eucalyptus-systems.com

Instances are now booting and getting their IP addresses/metadata as expected.

The diff for the env file is as follows:

[root@odc-f-09 calyptos-deploy]# diff environment.yml environment-vic.yml
46,47c46
<     # Mappings for only NCs and CCs
<       odc-f-28.prc.eucalyptus-systems.com: 10.105.10.70
---
>     # Mappings for only NCs and CLC
54a54,55
>   - &EUCANETD_HOST
>     odc-f-09.prc.eucalyptus-systems.com
83c84
<           EucanetdHost: *MIDO_GATEWAY_HOST
---
>           EucanetdHost: *EUCANETD_HOST
[root@odc-f-09 calyptos-deploy]#

dmccue commented 8 years ago

Made those changes: https://eucalyptus.atlassian.net/secure/attachment/25801/calyptos-1442240272.tgz Not able to connect to the midonet-api, will investigate

dmccue commented 8 years ago

https://eucalyptus.atlassian.net/secure/attachment/25802/calyptos-1442248064.tgz

(on clc) [root@odc-f-09 ~]# netstat -antp | grep 8080 tcp 0 0 :::8080 :::* LISTEN 26516/java [root@odc-f-09 ~]# midonet-cli --midonet-url=http://localhost:8080/midonet-api -A -e add tunnel-zone name mido-tz type gre The API server failed to respond normally. The network DB is possibly down. Bye. [root@odc-f-09 ~]# tail -1 /var/log/eucalyptus/eucanetd.log 2015-09-14 09:24:53 FATAL 000022010 mido_check_state | midonet-api is not reachable after 120 retries: eucanetd shutting down

Obviously the midonet-api is installed on the CLC (10.105.10.51), however the REST api is showing 404 for all calls. Likely to be a tomcat configuration issue or backend issue whereby tomcat can't communicate with zookeeper

viglesiasce commented 8 years ago

@dmccue looks like the midonet-api is pointing at 10.105.10.70 but zookeeper is running on 10.104.10.5. Can you rerun with 10.105.10.70 as your zookeeper host. The cookbook is currently only installing zookeeper on the midonet-api host.

viglesiasce commented 8 years ago

Sorry @dmccue i meant rerunning with 10.105.10.51

dmccue commented 8 years ago

@viglesiasce That has now built with exit code 0, there remains ingress connectivity issues over private and public IPs

Validators required:

midokura.zookeepers contains a minimum of one array item pointing to eucalyptus.topology.clc-1
midokura.midonet-api-url contains ip address of eucalyptus.topology.clc-1
eucalyptus.network.config-json.Mido.EucanetdHost contains hostname of eucalyptus.topology.clc-1

viglesiasce commented 8 years ago

Thanks @dmccue! You saved me the work of going back through this journey to figure out the right validators :+1:

dmccue commented 8 years ago

Have switched over to using midonet on clc and specifying localhost as midonet api endpoint

eucalyptus / calyptos

Calyptos: Running midonet-api on non-CLC hosts should fail validation #70