churchers / vm-bhyve

Shell based, minimal dependency bhyve manager
BSD 2-Clause "Simplified" License
829 stars 176 forks source link

Cannot ssh from host to vm and vice-versa #108

Open rajil opened 8 years ago

rajil commented 8 years ago

Hello,

I have an ubuntu vm running. I am able to ssh into this vm from other PC's on the network except the host. Ping from the host to the vm works fine though.

The vm config is as follows:

# vm version
vm-bhyve: Bhyve virtual machine management v1.1-p3 (build 101062)

VM details:

guest="linux"
loader="grub"
cpu=1
memory=1024M
network0_type="virtio-net"
network0_switch="public"
disk0_type="ahci-hd"
disk0_name="/dev/da1"
disk0_dev="custom"
grub_run_partition="3"
uuid="0ee2cc2e-5e0b-22e6-b0b3-0aa47a60202a"
network0_mac="12:2d:ee:17:b2:1a"

The vm switch is as follows:

# vm switch info public
------------------------
Virtual Switch: public
------------------------
  type: auto
  ident: bridge0
  vlan: -
  nat: -
  physical-ports: vlan100
  bytes-in: 25832 (25.226K)
  bytes-out: 97976042 (93.437M)

  virtual-port
    device: tap0
    vm: ubuntu

My rc.conf is as follows:

ifconfig_igb0="up"
ifconfig_igb1="up"
cloned_interfaces="lagg0 vlan100 vlan200"
ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1"
ifconfig_vlan100="inet 192.168.1.2 netmask 255.255.255.0 vlan 100 vlandev lagg0 fib 0"
ifconfig_vlan200="inet 192.168.2.2 netmask 255.255.255.0 vlan 200 vlandev lagg0 fib 1"

ifconfig

bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: vm-public
        nd6 options=1<PERFORMNUD>
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 12 priority 128 path cost 2000000
        member: vlan100 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 6 priority 128 path cost 2000000
tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: vmnet-ubuntu-0-public
        options=80000<LINKSTATE>
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        Opened by PID 49969

Instead of adding lagg0 to the switch, I added vlan100 directly. Is this the problem?

EpiJunkie commented 8 years ago

Instead of adding lagg0 to the switch, I added vlan100 directly. Is this the problem?

Attaching vlan100 is the correct device to attach to the switch unless you are trying to pass traffic on the lagg0 via the native vlan.

The thing that stands out to me is the lack of an up in the /etc/rc.conf but it is hard to tell with the lack of their information for ifconfig:

ifconfig_vlan100="inet 192.168.1.2 netmask 255.255.255.0 vlan 100 vlandev lagg0 fib 0"

should be

ifconfig_vlan100="inet 192.168.1.2 netmask 255.255.255.0 vlan 100 vlandev lagg0 fib 0 up"

rajil commented 8 years ago

vlan interfaces are up and network works fine on the host and its jails. Here is the ifconfig of vlan and lagg

lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        laggproto lacp lagghash l2,l3,l4
        laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
vlan100: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255 
        inet 192.168.1.26 netmask 0xffffffff broadcast 192.168.1.26 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        vlan: 100 parent interface: lagg0
vlan200: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=303<RXCSUM,TXCSUM,TSO4,TSO6>
        inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255 
        inet 192.168.2.10 netmask 0xffffffff broadcast 192.168.2.10 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        fib: 1
        vlan: 200 parent interface: lagg0
EpiJunkie commented 8 years ago

By the way, thank you for all the formatted post 👍 . Also sorry for hitting the basics first.

Hmm, this is curious. I had a similar issue with my on-board Broadcom NICs. I was able to ping the VM machines from the host and other VMs but unless the traffic originated from outside the box anything beyond pings failed. Have you tried an SSH connection from a VM to a VM? I bet this will not work. The solution was to disable all the hardware offloading mechanisms on the base NICs which will propagate down to the VLAN interface. I did this through the /etc/rc.conf but you could do it live but it requires bringing down the igb0 and igb1.

/etc/rc.conf:

ifconfig_igb0="-rxcsum -txcsum -rxcsum6 -txcsum6 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -tso -tso4 -tso6 -lro -vlanhwtso -vlanhwcsum up"
ifconfig_igb1="-rxcsum -txcsum -rxcsum6 -txcsum6 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -tso -tso4 -tso6 -lro -vlanhwtso -vlanhwcsum up"

If this does not work, I suggest running tcpdump from the host on the tap interface connecting to the guest. Use the -vvvv flag to get the verbosity level needed to figure out the problem. It will only take a few seconds with this verbosity to both fill the buffer and provide answer of the root cause.

rajil commented 8 years ago

Thanks, the changes you proposed to disable hardware offloading worked. I am now able to ssh from host to the vm. The NIC on my motherboard are Intel I210.

#lspci 
05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

I guess there will be some speed reduction by disabling offloading. I am bit surprised that this needed to be done on Intel NICs.

The Intel I210 exposes the following options:

#ifconfig igb0
igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

and i used the following options in /etc/rc.conf

ifconfig_igb0="-rxcsum -txcsum  -vlanmtu  -vlanhwtso  -tso4 -tso6 -vlanhwtso -vlanhwcsum up"
ifconfig_igb1="-rxcsum -txcsum  -vlanmtu  -vlanhwtso  -tso4 -tso6 -vlanhwtso -vlanhwcsum up"
EpiJunkie commented 8 years ago

Glad that those commands worked out for you. Your comment about the Intel NICs makes me wonder if the problem is with the VLANs on top of the LAGG. Which is the same configuration as what I am running.

rajil commented 7 years ago

Is it possible to use e1000 driver in FreeBSD-11 to circumvent this issue?