OpenNebula / one

The open source Cloud & Edge Computing Platform bringing real freedom to your Enterprise Cloud 🚀
http://opennebula.io
Apache License 2.0
1.19k stars 472 forks source link

VNC is not working for the default port #3512

Open kvaps opened 4 years ago

kvaps commented 4 years ago

Description Just faced with this issue right now, vm have no GRAPHICS/PORT variable set for some reason:

GRAPHICS = [
  LISTEN = "0.0.0.0",
  PASSWD = "08e1b7fd947e0eb89ebe54435688329232635e5d",
  RANDOM_PASSWD = "YES",
  TYPE = "VNC" ]

Process is running on default port 5900:

/usr/bin/qemu-system-x86_64 ... -vnc 0.0.0.0:0,password ...

If you try to connect VNC for this VM, you will get an error:

VNC Failed to connect to server (code: 1006) 

log of novnc server says:

10.112.1.105 - - [14/Jul/2019 10:36:04] 10.112.1.105: Plain non-SSL (ws://) WebSocket connection
10.112.1.105 - - [14/Jul/2019 10:36:04] 10.112.1.105: Version hybi-13, base64: 'False'
10.112.1.105 - - [14/Jul/2019 10:36:04] 10.112.1.105: Path: '/websockify/?token=3cz1u6wcvf615jjtborr'
10.112.1.105 - - [14/Jul/2019 10:36:04] connecting to: m5c35:
handler exception: Connect mode requires a port

If we check the token in /var/lib/one/sunstone_vnc_tokens/one-7954

w22taw6wbzmg3s9m6d84: m5c35:

but it should be:

w22taw6wbzmg3s9m6d84: m5c35:5900

if I manually change it like this, then vnc starting working for this single session

Details

Additional context Add any other context about the problem here.

Progress Status

rsmontero commented 4 years ago

This seems a works for me, closing

kvaps commented 4 years ago

Problem is not solved, just faced with this again:

    <GRAPHICS>
      <LISTEN><![CDATA[0.0.0.0]]></LISTEN>
      <PASSWD><![CDATA[62889c9c502e037bda530ca6aa9bc148815257e0aab4a3ccd63f72523ba40531]]></PASSWD>
      <RANDOM_PASSWD><![CDATA[YES]]></RANDOM_PASSWD>
      <TYPE><![CDATA[VNC]]></TYPE>
    </GRAPHICS>

solved by adding

      <PORT><![CDATA[5900]]></PORT>
rsmontero commented 4 years ago

Can you paste the deployment file for that VM?

kvaps commented 4 years ago

Sure, here is it:

deployment.18
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
    <name>one-8898</name>
    <title>ov3337</title>
    <vcpu><![CDATA[2]]></vcpu>
    <cputune>
        <shares>687</shares>
    </cputune>
    <memory>4194304</memory>
    <os>
        <type arch='x86_64'>hvm</type>
    </os>
    <cpu mode='host-passthrough'>
    </cpu>
    <devices>
        <emulator><![CDATA[/usr/bin/qemu-system-x86_64]]></emulator>
        <disk type='block' device='disk'>
            <source dev='/var/lib/one//datastores/200/8898/disk.0'/>
            <target dev='vda' bus='virtio'/>
            <driver name='qemu' type='raw' cache='writethrough'/>
        </disk>
        <disk type='file' device='cdrom'>
            <source file='/var/lib/one//datastores/200/8898/disk.2'/>
            <target dev='sda'/>
            <readonly/>
            <driver name='qemu' type='raw' cache='writethrough'/>
            <address type='drive' controller='0' bus='0' target='0' unit='0'/>
        </disk>
        <disk type='file' device='cdrom'>
            <source file='/var/lib/one//datastores/200/8898/disk.1'/>
            <target dev='hda' bus='ide'/>
            <readonly/>
            <driver name='qemu' type='raw'/>
        </disk>
        <interface type='bridge'>
            <source bridge='vmbr0v107'/>
            <mac address='02:00:59:dd:db:d5'/>
            <target dev='one-8898-0'/>
            <model type='virtio'/>
            <bandwidth>
                <inbound average='25000' peak='25000' burst='25000'/>
                <outbound average='25000' peak='25000' burst='25000'/>
            </bandwidth>
        </interface>
        <interface type='bridge'>
            <source bridge='vmbr0v107'/>
            <mac address='02:00:ff:1c:31:07'/>
            <target dev='one-8898-1'/>
            <model type='virtio'/>
            <bandwidth>
                <inbound average='25000' peak='25000' burst='25000'/>
                <outbound average='25000' peak='25000' burst='25000'/>
            </bandwidth>
        </interface>
        <interface type='bridge'>
            <source bridge='vmbr0v107'/>
            <mac address='02:00:59:dd:db:dc'/>
            <target dev='one-8898-2'/>
            <model type='virtio'/>
            <bandwidth>
                <inbound average='25000' peak='25000' burst='25000'/>
                <outbound average='25000' peak='25000' burst='25000'/>
            </bandwidth>
        </interface>
        <interface type='bridge'>
            <source bridge='vmbr0v107'/>
            <mac address='02:00:9e:95:28:12'/>
            <target dev='one-8898-3'/>
            <model type='virtio'/>
            <bandwidth>
                <inbound average='25000' peak='25000' burst='25000'/>
                <outbound average='25000' peak='25000' burst='25000'/>
            </bandwidth>
        </interface>
        <interface type='bridge'>
            <source bridge='vmbr0v107'/>
            <mac address='02:00:59:dd:db:dd'/>
            <target dev='one-8898-4'/>
            <model type='virtio'/>
            <bandwidth>
                <inbound average='25000' peak='25000' burst='25000'/>
                <outbound average='25000' peak='25000' burst='25000'/>
            </bandwidth>
        </interface>
        <interface type='bridge'>
            <source bridge='vmbr0v107'/>
            <mac address='02:00:14:4b:64:65'/>
            <target dev='one-8898-5'/>
            <model type='virtio'/>
            <bandwidth>
                <inbound average='25000' peak='25000' burst='25000'/>
                <outbound average='25000' peak='25000' burst='25000'/>
            </bandwidth>
        </interface>
        <graphics type='vnc' listen='0.0.0.0' passwd='62889c9c502e037bda530ca6aa9bc148815257e0aab4a3ccd63f72523ba40531'/>
    </devices>
    <features>
        <acpi/>
        <hyperv>
<relaxed state='on'/><vapic state='on'/>
        </hyperv>
    </features>
    <devices>
        <channel type='unix'>
            <source mode='bind'/><target type='virtio' name='org.qemu.guest_agent.0'/>
        </channel>
    </devices>
    <os><bootmenu enable='yes' timeout='3000'/></os>
    <metadata>
        <one:vm xmlns:one="http://opennebula.org/xmlns/libvirt/1.0">
            <one:system_datastore><![CDATA[/var/lib/one//datastores/200/8898]]></one:system_datastore>
            <one:name><![CDATA[ov3337]]></one:name>
            <one:uname><![CDATA[ov3337]]></one:uname>
            <one:uid>3458</one:uid>
            <one:gname><![CDATA[ov3337]]></one:gname>
            <one:gid>3256</one:gid>
            <one:opennebula_version>5.10.3</one:opennebula_version>
            <one:stime>1585105401</one:stime>
            <one:deployment_time>1585242204</one:deployment_time>
        </one:vm>
    </metadata>
</domain>

and after fix, it was changed like this:

94c94
<       <graphics type='vnc' listen='0.0.0.0' passwd='62889c9c502e037bda530ca6aa9bc148815257e0aab4a3ccd63f72523ba40531'/>
---
>       <graphics type='vnc' listen='0.0.0.0' port='5900' passwd='62889c9c502e037bda530ca6aa9bc148815257e0aab4a3ccd63f72523ba40531'/>
118c118
<           <one:deployment_time>1585242204</one:deployment_time>
---
>           <one:deployment_time>1585555765</one:deployment_time>

In both cases VNC was working, but in first case sunstone was generating wrong token.

rsmontero commented 4 years ago

This is quite strange, OpenNebula either assigns a PORT or fails at deployment file. Ports are assigned/reserved per cluster. Not sure what could be the problem, we cannot reproduce this. The normal behavior is not to set PORT and let OpenNebula pick one from the cluster pool of VNC ports.

Could you check oned for any errors when deploying the VM (e.g. DB problem not being able to update the VM)? or Do you have any reserved/custom START, RESERVED for the VNC ports in oned.conf? Could you test deploying the VM in other cluster or, is it happening in any cluster? What is the GRAPHICS attribute in the VM before deploying?

kvaps commented 4 years ago

I can't reproduce it too, I just facing with the issues sometimes when some VMs have no working VNC, and all these cases are always connected with missing PORT variable.

My tmplates are always having:

GRAPHICS = [
  LISTEN = "0.0.0.0",
  RANDOM_PASSWD = "YES",
  TYPE = "VNC" ]

But sometimes VMs are instantiating without PORT variable, eg:

GRAPHICS = [
  LISTEN = "0.0.0.0",
  PASSWD = "08e1b7fd947e0eb89ebe54435688329232635e5d",
  RANDOM_PASSWD = "YES",
  TYPE = "VNC" ]

This is case is happens quite rarely, and I'm not sure why I think it might be caused by some race condition, eg. when oned updating PORT, sched could upgrade SCHED_MESSAGE or what else is happening on PENDING --> PROLOG stage?

Any way, if fact we're having missing PORT and libvirt fallbacks to using default one 5900.

Another problem is that Sunstone can't handle missing PORT variable, so it's writing token like this:

w22taw6wbzmg3s9m6d84: m5c35:

but if it would fallback to default 5900 the same way like libvirt:

w22taw6wbzmg3s9m6d84: m5c35:5900

this would solve the cause of this problem.

kvaps commented 4 years ago

Just got bunch of VM with the similar error, and I got steps to reproduce:

One of OpenNebula host was failed, all VM on it become to UNKNOWN state After that technician was undeploy-hard all these VMs, then resume. After that all of them were scheduled to another hosts without PORT set:

    <GRAPHICS>
      <LISTEN><![CDATA[0.0.0.0]]></LISTEN>
      <PASSWD><![CDATA[9c79f4b8072b7a5ec48da75d8fdd9ef6f35c0041]]></PASSWD>
      <RANDOM_PASSWD><![CDATA[YES]]></RANDOM_PASSWD>
      <TYPE><![CDATA[VNC]]></TYPE>
    </GRAPHICS>
rsmontero commented 3 years ago

We are still not able to reproduce this issue, we are moving it to another milestone. If there is any additional info, please feel free to add it here

kvaps commented 3 years ago

@rsmontero we're facing with this issue on 5.10.5 quite often, steps to reproduce are described in https://github.com/OpenNebula/one/issues/3512#issuecomment-611692931

paczerny commented 3 years ago

I tried hard to reproduce the issue, unfortunately without success.

it might be caused by some race condition, eg. when oned updating PORT, sched could upgrade SCHED_MESSAGE

Agree that it's probably race condition. But SCHED_MESSAGE is in USER_TEMPLATE, it can't modify TEMPLATE. Do you have any hooks, which might call onevm updateconf?

Another problem is that Sunstone can't handle missing PORT variable ... but if it would fallback to default 5900 the same way like libvirt

The fallback to default helps only for the first VM. If more VMs have the same PORT the VNC doesn't work

kvaps commented 3 years ago

Hi @paczerny you're right, we're using certain hooks which shoots on runn, poff and done states. I guess this one is the man of the hour

#!/bin/bash

# NAME      = write_ip
# TYPE      = state
# COMMAND   = write_ip.sh
# ARGUMENTS = "$TEMPLATE"
# ON        = CUSTOM
# RESOURCE  = VM
# STATE     = ACTIVE
# LCM_STATE = RUNNING

UTILS_PATH=/var/lib/one/remotes/datastore
XPATH="${UTILS_PATH}/xpath.rb"

TEMPLATE="$1"
unset i j XPATH_ELEMENTS
while IFS= read -r -d '' element; do
    XPATH_ELEMENTS[i++]="$element"
done < <($XPATH -b $TEMPLATE '/VM/ID')

VM_ID="${XPATH_ELEMENTS[j++]}"

SED_ARGS=

# write ip hook
NIC_XPATHS="$(echo "$TEMPLATE" | base64 -d | xmlstarlet el | grep '^VM/TEMPLATE/CONTEXT/ETH' | grep -v VLAN)"

for NIC_XPATH in $NIC_XPATHS; do
    var=$(echo "$NIC_XPATH" | sed 's|^VM/TEMPLATE/CONTEXT/||g')
    val=$(echo "$TEMPLATE" | base64 -d | xmlstarlet sel -t -v "$NIC_XPATH" -n)
    if [ ! -z "$val" ]; then
        SED_ARGS+=" -e '/^$var=/d' -e '\\\$a${var}=\\\"${val}\\\"'"
    fi
done

if [ -n "$SED_ARGS" ]; then
    eval "EDITOR=\"sed -i $SED_ARGS\" onevm update \"$VM_ID\""
fi

This hook copies all NIC parameters into VM attributes

paczerny commented 3 years ago

@kvaps Not this one, this script uses onevm update. The graphics->port parameter can be changed only by onevm updateconf

kvaps commented 3 years ago

@kvaps Not this one, this script uses onevm update. The graphics->port parameter can be changed only by onevm updateconf

I have no any hooks which are using onevm updateconf method. My hooks are using only onevm update. One of them also use onevm resize and onevm disk-resize, but they are called only when VM become poweroff state🤷