GNS3 / dynamips

Dynamips development
GNU General Public License v2.0
349 stars 94 forks source link

Device links fail. Source IP is wrong. #76

Open cvazquez-edge opened 7 years ago

cvazquez-edge commented 7 years ago

I've got an issue with dynamips 0.2.11 and 0.2.16 on an AWS instance of Ubuntu 14.04. All the links between simulated devices (ethernet and serial) are unable to pass any traffic. The key difference I see between this instance and a working instance on my Mac is that the UDP tunneled traffic on the loopback uses the wrong source IP.

Tcpdump on loopback for working instance on Mac OS X, pinging R2 FA0/0 from R1 FA0/0. All traffic is sourced from 127.0.0.1:

13:05:28.908787 IP (tos 0x0, ttl 64, id 7909, offset 0, flags [none], proto UDP (17), length 88, bad cksum 0 (->5dae)!)
    127.0.0.1.10000 > 127.0.0.1.10001: [bad udp cksum 0xfe57 -> 0xb439!] UDP, length 60
    0x0000:  0200 0000 4500 0058 1ee5 0000 4011 0000  ....E..X....@...
    0x0010:  7f00 0001 7f00 0001 2710 2711 0044 fe57  ........'.'..D.W
    0x0020:  cc02 6b01 0000 cc02 6b01 0000 9000 0000  ..k.....k.......
    0x0030:  0100 0000 0000 0000 0000 0000 0000 0000  ................
    0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
    0x0050:  0000 0000 0000 0000 0000 0000            ............
13:05:29.708890 IP (tos 0x0, ttl 64, id 35228, offset 0, flags [none], proto UDP (17), length 88, bad cksum 0 (->f2f6)!)
    127.0.0.1.10001 > 127.0.0.1.10000: [bad udp cksum 0xfe57 -> 0xb435!] UDP, length 60
    0x0000:  0200 0000 4500 0058 899c 0000 4011 0000  ....E..X....@...
    0x0010:  7f00 0001 7f00 0001 2711 2710 0044 fe57  ........'.'..D.W
    0x0020:  cc03 6b02 0000 cc03 6b02 0000 9000 0000  ..k.....k.......
    0x0030:  0100 0000 0000 0000 0000 0000 0000 0000  ................
    0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
    0x0050:  0000 0000 0000 0000 0000 0000            ............

Broken instance on AWS Ubuntu 14.04, pinging R2 from R1. All pings and some other traffic are sourced from the external IP address of the AWS instance:

15:38:30.435262 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 102: (tos 0x0, ttl 64, id 62743, offset 0, flags [DF], proto UDP (17), length 88)
    <Eth0's external IP>.10000 > 127.0.0.1.10001: [bad udp cksum 0x9e5d -> 0xb83a!] UDP, length 60
    0x0000:  0000 0000 0000 0000 0000 0000 0800 4500  ..............E.
    0x0010:  0058 f517 4000 4011 a775 0a63 14a4 7f00  .X..@.@..u.c....
    0x0020:  0001 2710 2711 0044 9e5d ffff ffff ffff  ..'.'..D.]......
    0x0030:  ca00 2c5e 0006 0806 0001 0800 0604 0001  ..,^............
    0x0040:  ca00 2c5e 0006 ac14 0001 0000 0000 0000  ..,^............
    0x0050:  ac14 0002 0000 0000 0000 0000 0000 0000  ................
    0x0060:  0000 0000 0000                           ......
15:38:30.435288 00:00:00:00:00:00 > 00:00:00:00:00:00, ethertype IPv4 (0x0800), length 130: (tos 0xc0, ttl 64, id 27801, offset 0, flags [none], proto ICMP (1), length 116)
    127.0.0.1 > 127.0.0.1: ICMP 127.0.0.1 udp port 10001 unreachable, length 96
    (tos 0x0, ttl 64, id 62743, offset 0, flags [DF], proto UDP (17), length 88)
    127.0.0.1.10000 > 127.0.0.1.10001: [bad udp cksum 0x3e63 -> 0x5840!] UDP, length 60
    0x0000:  0000 0000 0000 0000 0000 0000 0800 45c0  ..............E.
    0x0010:  0074 6c99 0000 4001 0f2e 7f00 0001 7f00  .tl...@.........
    0x0020:  0001 0303 1532 0000 0000 4500 0058 f517  .....2....E..X..
    0x0030:  4000 4011 477b 7f00 0001 7f00 0001 2710  @.@.G{........'.
    0x0040:  2711 0044 3e63 ffff ffff ffff ca00 2c5e  '..D>c........,^
    0x0050:  0006 0806 0001 0800 0604 0001 ca00 2c5e  ..............,^
    0x0060:  0006 ac14 0001 0000 0000 0000 ac14 0002  ................
    0x0070:  0000 0000 0000 0000 0000 0000 0000 0000  ................
    0x0080:  0000                                     ..

On the broken instance I see these logs:

Oct 04 16:32:57.300 HYPERVISOR: exec_cmd: hypervisor version 
Oct 04 16:32:57.300 HYPERVISOR: exec_cmd: hypervisor reset 
Oct 04 16:32:57.300 GENERAL: reset done.
Oct 04 16:32:57.300 HYPERVISOR: exec_cmd: hypervisor working_dir /<snip!>/dynagen/simple_2_routers 
Oct 04 16:32:57.300 GENERAL: working_dir=/home/cvazquez/dynagen/simple_2_routers
Oct 04 16:32:57.300 HYPERVISOR: exec_cmd: vm create R1 0 c7200 
Oct 04 16:32:57.302 VM: VM R1 created.
Oct 04 16:32:57.302 HYPERVISOR: exec_cmd: vm set_con_tcp_port R1 2000 
Oct 04 16:32:57.302 HYPERVISOR: exec_cmd: c7200 set_npe R1 npe-400 
Oct 04 16:32:57.303 HYPERVISOR: exec_cmd: vm set_ram R1 160 
Oct 04 16:32:57.303 HYPERVISOR: exec_cmd: c7200 set_npe R1 npe-400 
Oct 04 16:32:57.303 HYPERVISOR: exec_cmd: vm set_ios R1 /<snip!>/c7200-adventerprisek9-mz.124-20.T.bin 
Oct 04 16:32:57.303 HYPERVISOR: exec_cmd: vm set_sparse_mem R1 0 
Oct 04 16:32:57.303 HYPERVISOR: exec_cmd: vm set_blk_direct_jump R1 0 
Oct 04 16:32:57.303 HYPERVISOR: exec_cmd: vm set_idle_pc R1 0x6061aa54 
Oct 04 16:32:57.303 HYPERVISOR: exec_cmd: vm create R2 1 c7200 
Oct 04 16:32:57.305 VM: VM R2 created.
Oct 04 16:32:57.305 HYPERVISOR: exec_cmd: vm set_con_tcp_port R2 2001 
Oct 04 16:32:57.305 HYPERVISOR: exec_cmd: c7200 set_npe R2 npe-400 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: vm set_ram R2 160 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: c7200 set_npe R2 npe-400 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: vm set_ios R2 /<snip!>/c7200-adventerprisek9-mz.124-20.T.bin 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: vm set_sparse_mem R2 0 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: vm set_blk_direct_jump R2 0 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: vm set_idle_pc R2 0x6061aa54 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: vm slot_add_binding R1 0 0 C7200-IO-2FE 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: nio create_tap nio_tap0 tap0 
Oct 04 16:32:57.306 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R1 0 0 nio_tap0 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: vm slot_add_binding R2 0 0 C7200-IO-2FE 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: nio create_udp nio_udp0 10000 127.0.0.1 10001 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: nio create_udp nio_udp1 10001 127.0.0.1 10000 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R1 0 1 nio_udp0 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R2 0 1 nio_udp1 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: vm slot_add_binding R1 1 0 PA-8T 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: vm slot_add_binding R2 1 0 PA-8T 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: nio create_udp nio_udp2 10002 127.0.0.1 10003 
Oct 04 16:32:57.307 HYPERVISOR: exec_cmd: nio create_udp nio_udp3 10003 127.0.0.1 10002 
Oct 04 16:32:57.308 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R1 1 0 nio_udp2 
Oct 04 16:32:57.308 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R2 1 0 nio_udp3 
Oct 04 16:32:57.308 HYPERVISOR: exec_cmd: vm start R1
Oct 04 16:32:57.598 HYPERVISOR: exec_cmd: vm start R2 
Oct 04 16:40:13.667 HYPERVISOR: exec_cmd: hypervisor reset 
Oct 04 16:40:13.774 VM: VM R2 shutdown.
Oct 04 16:40:13.774 VM: VM R2 destroyed.
Oct 04 16:40:13.914 VM: VM R1 shutdown.
Oct 04 16:40:13.914 VM: VM R1 destroyed.
Oct 04 16:40:13.915 GENERAL: reset done.

And these Established UDP connections:

netstat -an | grep udp
udp        0      0 127.0.0.1:10000         127.0.0.1:10001         ESTABLISHED
udp        0      0 127.0.0.1:10001         127.0.0.1:10000         ESTABLISHED
udp        0      0 127.0.0.1:10002         127.0.0.1:10003         ESTABLISHED
udp        0      0 127.0.0.1:10003         127.0.0.1:10002         ESTABLISHED

Appreciate any help, and I'm happy to produce more logs or test results, Chris Vazquez

julien-duponchelle commented 7 years ago

How do you connect the GUI to remote server? You use a VPN?

cvazquez-edge commented 7 years ago

I've used a couple methods. I can replicate the issue on the CLI using dynagen, and also running gns3server and remotely tunneling in with SSH in a VPN.

GNS3 GUI seems to have no trouble connecting and issuing commands to gns3server and dynamips.

Thanks!

julien-duponchelle commented 7 years ago

A SSH tunnel will bug because on the remote the other side of the tunnel will be see as 127.0.0.1.

Both dynamips need to have different IP and be able to communicate with each other via all the port.

In this script you have a sample for setting up an OpenVPN: https://github.com/GNS3/gns3-server/blob/master/scripts/remote-install.sh

cvazquez-edge commented 7 years ago

I don't think the SSH or VPN are an issue here.

I'm having this problem running all the simulations on the same AWS instance. This bad behavior occurs with a single dynamips instance running in hypervisor mode and controlled locally by dynagen, with two routers simulated. It also occurs with a local instance of gns3server starting two separate instances of dynamips locally (one for each router), and controlled remotely by the GNS3 GUI.

I haven't tried splitting the simulation load between my local machine and AWS. My goal here is to have a long term simulation running 100% on AWS, for testing some network management tools and producing realistic monitoring data. If they leak memory or fall over eventually, that's fine - I'll just have them restarted.

Other complications of this setup are docker and its bridges are running there too. Along with a TAP interface and a GRE and IPIP tunnel to another host. Connecting dynamips to tap0 works, and pinging that simulated router from the native host, or from across the GRE tunnel both work. Just inter-simulated router traffic fails.

julien-duponchelle commented 7 years ago

How do you start dynamips?

I think you need to tell to it that he need to bind on 127.0.0.1 otherwise he bind on his public ip and the AWS firewall block it.

cvazquez-edge commented 7 years ago

When running with dynagen, I start it like this: /usr/local/bin/dynamips -H 7200 -l /var/log/dynamips.log

When gns3server starts it, it's like this: Starting Dynamips: ['/usr/local/bin/dynamips', '-N1', '-l', 'dynamips_i3_log.txt', '-H', '55369']

Both bind to 0.0.0.0:7200, and when I start the sims, the 'nio create_udp' commands from dynagen or gns3 create those connections from 127.0.0.1:1000<->127.0.0.1:10001 etc.

Looks like the man page and CLI help have a discrepancy - I didn't realize I could give it an IP and port for '-H'.

Trying it like this: /usr/local/bin/dynamips -H 127.0.0.1:7200 -l /var/log/dynamips.log

I get the same behavior, except the hypervisor binds to 127.0.0.1 instead of 0.0.0.0. The UDP tunnel traffic still has the source address of eth0.

cvazquez-edge commented 7 years ago

I found a workaround. Instead of relying on dynagen (or GNS3 GUI) to infer and create the correct links, I specified the nio_udp parameters manually to use the eth0 address instead of localhost:

Broken dynagen .net snippet. Creates UDP connections to/from localhost:

   [[ROUTER R1]]
      model = 7200
      idlepc = 0x6061aa54
      f0/0 = R2 f0/0
      s1/0 = R2 s1/0
   [[router R2]]
      model = 7200
      idlepc = 0x6061aa54
      # Implicit: f0/0 = R1 f0/0
      # Implicit: s1/0 = R1 s1/0

Working Dynagen.net snippet. Creates UDP connections to/from external IP, since that's where dynamips will source the traffic from anyway:

   [[ROUTER R1]]
      model = 7200
      idlepc = 0x6061aa54
      f0/0 = nio_udp:10000:<external IP>:10001
      s1/0 = nio_udp:10002:<external IP>:10003
   [[router R2]]
      model = 7200
      idlepc = 0x6061aa54
      f0/0 = nio_udp:10001:<external IP>:10000
      s1/0 = nio_udp:10003:<external IP>:10002

Dynamips logs, much like before, but not using the loopback address for UDP tunnels:

Oct 05 20:54:56.707 HYPERVISOR: exec_cmd: nio create_udp nio_udp0 10000 <external IP> 10001 
Oct 05 20:54:56.708 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R1 0 0 nio_udp0 
Oct 05 20:54:56.708 HYPERVISOR: exec_cmd: vm slot_add_binding R1 1 0 PA-8T 
Oct 05 20:54:56.708 HYPERVISOR: exec_cmd: nio create_udp nio_udp1 10002 <external IP> 10003 
Oct 05 20:54:56.709 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R1 1 0 nio_udp1 
Oct 05 20:54:56.709 HYPERVISOR: exec_cmd: vm slot_add_binding R2 0 0 C7200-IO-2FE 
Oct 05 20:54:56.709 HYPERVISOR: exec_cmd: nio create_udp nio_udp2 10001 <external IP> 10000 
Oct 05 20:54:56.709 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R2 0 0 nio_udp2 
Oct 05 20:54:56.710 HYPERVISOR: exec_cmd: vm slot_add_binding R2 1 0 PA-8T 
Oct 05 20:54:56.710 HYPERVISOR: exec_cmd: nio create_udp nio_udp3 10003 <external IP> 10002 
Oct 05 20:54:56.710 HYPERVISOR: exec_cmd: vm slot_add_nio_binding R2 1 0 nio_udp3 

Netstat shows these connections:

udp        0      0 <external IP>:10000      <external IP>:10001      ESTABLISHED
udp        0      0 <external IP>:10001      <external IP>:10000      ESTABLISHED
udp        0      0 <external IP>:10002      <external IP>:10003      ESTABLISHED
udp        0      0 <external IP>:10003      <external IP>:10002      ESTABLISHED

I can now ping to/from R1 and R2 using ethernet and serial links. Traffic on the host still has the external IP as the source IP address, it just happens to be correct now.

It would be nice to figure out why dynamips is behaving badly in this environment, but not on others. Then I could go back to letting dynagen/GNS3 do more of the conveniences of counting ports and inferring the other sides of the links. That'll definitely get tedious with larger configs. But I'm not stuck anymore at least.

Thanks for looking into this! Still happy to provide logs or tcpdumps or whatever.