Exa-Networks / exabgp

The BGP swiss army knife of networking
Other
2.1k stars 447 forks source link

state machine invalid transition ? #296

Closed pavel-odintsov closed 9 years ago

pavel-odintsov commented 9 years ago

Hello, Thomas!

Some tests with announce receiver resulted to ExaBGP break :(

 env exabgp.log.level=DEBUG  exabgp.daemon.user=root exabgp.tcp.bind="0.0.0.0" exabgp.tcp.port=179 exabgp.daemon.daemonize=false exabgp.daemon.pid=/var/run/exabgp.pid exabgp.log.destination=/var/log/exabgp.log   sbin/exabgp /etc/exabgp_listener.conf 
/usr/src/exabgp/lib/exabgp/bgp/message/message.py:61: DeprecationWarning: object.__init__() takes no parameters
  int.__init__(self,value)
exabgp 986070 configuration environment file missing
exabgp 986070 configuration generate it using "exabgp --fi > /usr/src/exabgp/etc/exabgp/exabgp.env"
exabgp 986070 reactor       Performing reload of exabgp 3.4.10
exabgp 986070 configuration parsing | configuration | 'group' 'core_v4' '{'
exabgp 986070 configuration parsing | group         | 'hold-time' '180' ';'
exabgp 986070 configuration parsing | group         | 'local-as' '65000' ';'
exabgp 986070 configuration parsing | group         | 'peer-as' '65000' ';'
exabgp 986070 configuration parsing | group         | 'router-id' '10.0.129.2' ';'
exabgp 986070 configuration parsing | group         | 'neighbor' '10.0.131.2' '{'
exabgp 986070 configuration parsing | neighbor      | 'local-address' '10.0.129.2' ';'
exabgp 986070 configuration parsing | neighbor      | 'description' '"bird' 'route' 'server"' ';'
exabgp 986070 configuration parsing | neighbor      | 'process' 'stdout' '{'
exabgp 986070 configuration parsing | process       | 'neighbor-changes' ';'
exabgp 986070 configuration parsing | process       | 'receive' '{'
exabgp 986070 configuration parsing | receive       | 'parsed' ';'
exabgp 986070 configuration parsing | receive       | 'update' ';'
exabgp 986070 configuration parsing | receive       | '}'
exabgp 986070 configuration parsing | process       | 'encoder' 'json' ';'
exabgp 986070 configuration parsing | process       | 'run' '/usr/src/exabgp_hook.py ' ';'
exabgp 986070 configuration parsing | process       | '}'
exabgp 986070 configuration parsing | neighbor      | '}'
exabgp 986070 configuration neighbor 10.0.131.2 {
exabgp 986070 configuration   description "bird route server";
exabgp 986070 configuration   router-id 10.0.129.2;
exabgp 986070 configuration   host-name ovz78;
exabgp 986070 configuration   domain-name fvee;
exabgp 986070 configuration   local-address 10.0.129.2;
exabgp 986070 configuration   local-as 65000;
exabgp 986070 configuration   peer-as 65000;
exabgp 986070 configuration   hold-time 180;
exabgp 986070 configuration   group-updates: True;
exabgp 986070 configuration   auto-flush: true;
exabgp 986070 configuration   adj-rib-out: true;
exabgp 986070 configuration   ttl-security: ;
exabgp 986070 configuration 
exabgp 986070 configuration   capability {
exabgp 986070 configuration     asn4 enable;
exabgp 986070 configuration     route-refresh disable;
exabgp 986070 configuration     graceful-restart disable;
exabgp 986070 configuration     add-path disable;
exabgp 986070 configuration     multi-session disable;
exabgp 986070 configuration     operational disable;
exabgp 986070 configuration     aigp disable;
exabgp 986070 configuration   }
exabgp 986070 configuration   family {
exabgp 986070 configuration     inet4 unicast;
exabgp 986070 configuration     inet4 multicast;
exabgp 986070 configuration     inet4 nlri-mpls;
exabgp 986070 configuration     inet4 mpls-vpn;
exabgp 986070 configuration     inet4 rtc;
exabgp 986070 configuration     inet4 flow;
exabgp 986070 configuration     inet4 flow-vpn;
exabgp 986070 configuration     inet6 unicast;
exabgp 986070 configuration     inet6 multicast;
exabgp 986070 configuration     inet6 nlri-mpls;
exabgp 986070 configuration     inet6 mpls-vpn;
exabgp 986070 configuration     inet6 flow;
exabgp 986070 configuration     inet6 flow-vpn;
exabgp 986070 configuration     l2vpn vpls;
exabgp 986070 configuration     l2vpn evpn;
exabgp 986070 configuration   }
exabgp 986070 configuration   process {
exabgp 986070 configuration     receive {
exabgp 986070 configuration       parsed;
exabgp 986070 configuration       update;
exabgp 986070 configuration     }
exabgp 986070 configuration   }
exabgp 986070 configuration }
exabgp 986070 configuration 
exabgp 986070 configuration 
exabgp 986070 configuration parsing | group         | '}'
exabgp 986070 reactor       New peer: neighbor 10.0.131.2 local-ip 10.0.129.2 local-as 65000 peer-as 65000 router-id 10.0.129.2 family-allowed in-open
exabgp 986070 configuration Loaded new configuration successfully
exabgp 986070 reactor       Listening for BGP session(s) on 0.0.0.0:179
exabgp 986070 processes     Forked process stdout

********************************************************************************
EXABGP CRASHED / HELP US FIX IT
********************************************************************************

Sorry, you encountered a problem with ExaBGP and we could not keep the program
running.

There are a few things you can do to help us (and yourself):
- make sure you are running the latest version of the code available at
  https://github.com/Exa-Networks/exabgp/releases/latest
- if so report the issue on https://github.com/Exa-Networks/exabgp/issues
  so it can be fixed (github can be searched for similar reports)

PLEASE, when reporting, do include as much information as you can:
- do not obfuscate any data (feel free to send us a private  email with the
  extra information if your business policy is strict on information sharing)
  https://github.com/Exa-Networks/exabgp/wiki/FAQ
- if you can reproduce the issue, run ExaBGP with the command line option -d
  it provides us with much needed information to fix problems quickly
- include the information presented below

Should you not receive an acknowledgment of your issue on github (assignement,
comment, or similar) within a few hours, feel free to email us to make sure
it was not overlooked. (please keep in mind the authors are based in GMT/Europe)

********************************************************************************
-- Please provide ALL the information below on :
-- https://github.com/Exa-Networks/exabgp/issues
********************************************************************************

ExaBGP version : 3.4.10
Python version : 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)  [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
System Uname   : #1 SMP Mon Aug 11 18:47:39 MSK 2014
System MaxInt  : 9223372036854775807

-- Traceback

Traceback (most recent call last):
  File "/usr/src/exabgp/lib/exabgp/application/bgp.py", line 316, in <module>
    main()
  File "/usr/src/exabgp/lib/exabgp/application/bgp.py", line 234, in main
    run(env,comment,configurations)
  File "/usr/src/exabgp/lib/exabgp/application/bgp.py", line 268, in run
    ok = Reactor(configurations).run()
  File "/usr/src/exabgp/lib/exabgp/reactor/loop.py", line 243, in run
    if peer.incoming(connection):
  File "/usr/src/exabgp/lib/exabgp/reactor/peer.py", line 290, in incoming
    self._incoming.fsm.change(FSM.ACTIVE)
  File "/usr/src/exabgp/lib/exabgp/bgp/fsm.py", line 69, in change
    raise RuntimeError ('invalid state machine transition (from %s to %s)' % (str(self.state),str(state)))
RuntimeError: invalid state machine transition (from OPENCONFIRM to ACTIVE)

-- Configuration

-- Logging History

exabgp 986070 debug         session 1 outgoing 10.0.129.2 / 10.0.131.2          SENDING  ( 392) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 009D 0104 FDE8 00B4 0A00 8102 8002 0601 0400 0100 0102 0601 0400 0100 0202 0601 0400 0100 0402 0601 0400 0100 8002 0601 0400 0100 8402 0601 0400 0100 8502 0601 0400 0100 8602 0601 0400 0200 0102 0601 0400 0200 0202 0601 0400 0200 0402 0601 0400 0200 8002 0601 0400 0200 8502 0601 0400 0200 8602 0601 0400 1900 4102 0601 0400 1900 4602 0641 0400 00FD E8
exabgp 986070 info          Peer      10.0.131.2 ASN 65000   >> KEEPALIVE (OPENCONFIRM)
exabgp 986070 info          Peer      10.0.131.2 ASN 65000   >> OPEN version=4 asn=65000 hold_time=180 router_id=10.0.129.2 capabilities=[Multiprotocol(ipv4 unicast,ipv4 multicast,ipv4 nlri-mpls,ipv4 mpls-vpn,ipv4 rtc,ipv4 flow,ipv4 flow-vpn,ipv6 unicast,ipv6 multicast,ipv6 nlri-mpls,ipv6 mpls-vpn,ipv6 flow,ipv6 flow-vpn,l2vpn vpls,l2vpn evpn), ASN4(65000)]
exabgp 986070 debug         session 1 outgoing 10.0.129.2 / 10.0.131.2          RECEIVED  (  47) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0031 01
exabgp 986070 debug         session 1 outgoing 10.0.129.2 / 10.0.131.2          RECEIVED  (  74) 04FD E800 F00A 000A 2614 0212 0104 0001 0001 0200 4002 0000 4104 0000 FDE8
exabgp 986070 info          Peer      10.0.131.2 ASN 65000   << OPEN
exabgp 986070 info          Peer      10.0.131.2 ASN 65000   << OPEN version=4 asn=65000 hold_time=240 router_id=10.0.10.38 capabilities=[Multiprotocol(ipv4 unicast), Route Refresh, Graceful Restart Flags 0x0 Time 0 , ASN4(65000)]
exabgp 986070 debug         peer 10.0.131.2 ASN 65000   Receive Timer 59 second(s) left
exabgp 986070 network       outgoing connection finds the incoming connection is in openconfirm
exabgp 986070 network       closing the incoming connection
exabgp 986070 network       Peer      10.0.131.2 ASN 65000   in loop, stop, message [collision local id < remote id]
exabgp 986070 debug         session 1 incoming, closing connection from 10.0.131.2 to 10.0.129.2
exabgp 986070 debug         session 1 outgoing 10.0.129.2 / 10.0.131.2          SENDING  (  47) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0013 04
exabgp 986070 info          Peer      10.0.131.2 ASN 65000   >> KEEPALIVE (OPENCONFIRM)
exabgp 986070 debug         session 1 outgoing 10.0.129.2 / 10.0.131.2          RECEIVED  (  47) FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF 0015 03
exabgp 986070 debug         session 1 outgoing 10.0.129.2 / 10.0.131.2          RECEIVED  (   4) 0607
exabgp 986070 info          Peer      10.0.131.2 ASN 65000   << NOTIFICATION
exabgp 986070 network       Peer      10.0.131.2 ASN 65000   out loop, peer reset, message [notification received (6,7)] error[Cease / Connection Collision Resolution]
exabgp 986070 debug         session 1 outgoing, closing connection from 10.0.129.2 to 10.0.131.2
exabgp 986070 network       out loop, stopping, other one is established
exabgp 986070 debug         Connection from 10.0.129.2

And I could not reproduce this case.

On the another side I have bird with ~20 /32 announces.

pavel-odintsov commented 9 years ago

I could reproduce it, it fails every 3th call in this env.

thomas-mangin commented 9 years ago

Are you running 3.4 stable, if so this test of the machine state should be disabled in the latest version.

pavel-odintsov commented 9 years ago

I got this code from master...

thomas-mangin commented 9 years ago

master is currently in a BIG state of flux please use 3.4 stable ( at least for the next few weeks )

pavel-odintsov commented 9 years ago

OK, roger that :)

thomas-mangin commented 9 years ago

Ok - our state machine is not working as should - I have implemented a work around. This issue will not happen until it is fully fixed.