OpenNebula / one

The open source Cloud & Edge Computing Platform bringing real freedom to your Enterprise Cloud 🚀
http://opennebula.io
Apache License 2.0
1.2k stars 474 forks source link

802.1Q post fails on snap-based LXD setups #3596

Closed alitvak69 closed 4 years ago

alitvak69 commented 4 years ago

Description As I attempt to instantiate LXD container it fails to boot with an error stating

Failed to execute network driver operation: post.

To Reproduce Steps to reproduce the behavior.

Instantiate Centos7 or Fedora 29/30 LXD

Expected behavior

LXD container boots and runs.

Details

Additional context Here is the full LXD log: Fri Aug 16 11:45:11 2019 [Z0][VM][I]: New state is ACTIVE Fri Aug 16 11:45:11 2019 [Z0][VM][I]: New LCM state is PROLOG Fri Aug 16 11:45:13 2019 [Z0][VM][I]: New LCM state is BOOT Fri Aug 16 11:45:13 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/76/deployment.0 Fri Aug 16 11:45:13 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context. Fri Aug 16 11:45:14 2019 [Z0][VMM][I]: ExitCode: 0 Fri Aug 16 11:45:14 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre. Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: deploy: Processing disk 0 Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/snap/lxd/common/lxd/storage-pools/default/containers/one-76/rootfs using device /dev/nbd1 Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: deploy: Resizing filesystem ext4 on /dev/nbd1 Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: deploy: Mounting /dev/nbd1 at /var/snap/lxd/common/lxd/storage-pools/default/containers/one-76/rootfs Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/one/datastores/101/76/mapper/disk.1 using device /dev/loop0 Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: deploy: Mounting /dev/loop0 at /var/lib/one/datastores/101/76/mapper/disk.1 Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: deploy: --- Starting container --- Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: ExitCode: 0 Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy. Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: Command execution fail: /var/tmp/one/vnm/fw/post PFZNPjxJRD43NjwvSUQ+PERFUExPWV9JRC8+PFRFTVBMQVRFPjxDT05URVhUPjxESVNLX0lEPjwhW0NEQVRBWzFdXT48L0RJU0tfSUQ+PEVUSDBfQ09OVEVYVF9GT1JDRV9JUFY0PjwhW0NEQVRBW11dPjwvRVRIMF9DT05URVhUX0ZPUkNFX0lQVjQ+PEVUSDBfRE5TPjwhW0NEQVRBWzEwLjAuNDguMTEgMTAuMC40OC4xMiAxMC4wLjIxLjYzXV0+PC9FVEgwX0ROUz48RVRIMF9FWFRFUk5BTD48IVtDREFUQVtdXT48L0VUSDBfRVhURVJOQUw+PEVUSDBfR0FURVdBWT48IVtDREFUQVsxMC4wLjcyLjFdXT48L0VUSDBfR0FURVdBWT48RVRIMF9HQVRFV0FZNj48IVtDREFUQVtdXT48L0VUSDBfR0FURVdBWTY+PEVUSDBfSVA+PCFbQ0RBVEFbMTAuMC43NC4xXV0+PC9FVEgwX0lQPjxFVEgwX0lQNj48IVtDREFUQVtdXT48L0VUSDBfSVA2PjxFVEgwX0lQNl9QUkVGSVhfTEVOR1RIPjwhW0NEQVRBW11dPjwvRVRIMF9JUDZfUFJFRklYX0xFTkdUSD48RVRIMF9JUDZfVUxBPjwhW0NEQVRBW11dPjwvRVRIMF9JUDZfVUxBPjxFVEgwX01BQz48IVtDREFUQVswMjowMDowYTowMDo0YTowMV1dPjwvRVRIMF9NQUM+PEVUSDBfTUFTSz48IVtDREFUQVsyNTUuMjU1LjI1Mi4wXV0+PC9FVEgwX01BU0s+PEVUSDBfTVRVPjwhW0NEQVRBW11dPjwvRVRIMF9NVFU+PEVUSDBfTkVUV09SSz48IVtDREFUQVsxMC4wLjcyLjBdXT48L0VUSDBfTkVUV09SSz48RVRIMF9TRUFSQ0hfRE9NQUlOPjwhW0NEQVRBW11dPjwvRVRIMF9TRUFSQ0hfRE9NQUlOPjxFVEgwX1ZMQU5fSUQ+PCFbQ0RBVEFbXV0+PC9FVEgwX1ZMQU5fSUQ+PEVUSDBfVlJPVVRFUl9JUD48IVtDREFUQVtdXT48L0VUSDBfVlJPVVRFUl9JUD48RVRIMF9WUk9VVEVSX0lQNj48IVtDREFUQVtdXT48L0VUSDBfVlJPVVRFUl9JUDY+PEVUSDBfVlJPVVRFUl9NQU5BR0VNRU5UPjwhW0NEQVRBW11dPjwvRVRIMF9WUk9VVEVSX01BTkFHRU1FTlQ+PE5FVFdPUks+PCFbQ0RBVEFbWUVTXV0+PC9ORVRXT1JLPjxTRVRfSE9TVE5BTUU+PCFbQ0RBVEFbYWxleC1jZW50b3M3XV0+PC9TRVRfSE9TVE5BTUU+PFNTSF9QVUJMSUNfS0VZPjwhW0NEQVRBW3NzaC1yc2EgQUFBQUIzTnphQzF5YzJFQUFBQURBUUFCQUFBQkFRREs0R0RCR0s2SnZ2QVpPUG1kQVFtdlYyQlJ6OHlDOXBodWR6ZksveUNESHUrRjNyblN6aG5hbEE2M0NNdTF3enFGRUgwVmE4WHlwZUhuTm1PZHJuU1VPdWF4M0tTWEZHMmQ3SGhSZHdvMFZ0eUh3STY5S3Vza1V6MXMrRWw1akRCV2wveUt4TkJvQ3N2Yk1XTksySFRpU3A5RUxmRC9IRHNwM0ZXU3RlUHlha3Y4R2ZoV3FTYnlvREpyeXpkb3lOeGV4YnpXZmNOQXl1TWlCVXpVQVFpaXQwS0R5RFMzZVNtYlVOV0NBLzExRjY3endlbVhqNE1VWXZ0R0RwNXpLaXg3cnIrQ05tbmZocVkwVEtGSGw4MGF4Mjl5QVpRNjA4ZE9VOFFjTmxYSlpsUURxME9HVk9sdEN4M3dpRVdRUXA0b2xZWXZxUXlBZVBsOFJPd3F4Skc5IGFsZXhsQHB1bWFdXT48L1NTSF9QVUJMSUNfS0VZPjxUQVJHRVQ+PCFbQ0RBVEFbaGRiXV0+PC9UQVJHRVQ+PC9DT05URVhUPjwvVEVNUExBVEU+PFVTRVJfVEVNUExBVEU+PEhZUEVSVklTT1I+PCFbQ0RBVEFbbHhkXV0+PC9IWVBFUlZJU09SPjxJTlBVVFNfT1JERVI+PCFbQ0RBVEFbXV0+PC9JTlBVVFNfT1JERVI+PExPR08+PCFbQ0RBVEFbaW1hZ2VzL2xvZ29zL2NlbnRvcy5wbmddXT48L0xPR08+PExYRF9QUk9GSUxFPjwhW0NEQVRBW11dPjwvTFhEX1BST0ZJTEU+PExYRF9TRUNVUklUWV9ORVNUSU5HPjwhW0NEQVRBW3llc11dPjwvTFhEX1NFQ1VSSVRZX05FU1RJTkc+PExYRF9TRUNVUklUWV9QUklWSUxFR0VEPjwhW0NEQVRBW3llc11dPjwvTFhEX1NFQ1VSSVRZX1BSSVZJTEVHRUQ+PE1FTU9SWV9VTklUX0NPU1Q+PCFbQ0RBVEFbTUJdXT48L01FTU9SWV9VTklUX0NPU1Q+PFNDSEVEX0RTX1JBTks+PCFbQ0RBVEFbRlJFRV9NQl1dPjwvU0NIRURfRFNfUkFOSz48U0NIRURfRFNfUkVRVUlSRU1FTlRTPjwhW0NEQVRBW0lEPSIxMDEiXV0+PC9TQ0hFRF9EU19SRVFVSVJFTUVOVFM+PFNDSEVEX1JBTks+PCFbQ0RBVEFbLVJVTk5JTkdfVk1TXV0+PC9TQ0hFRF9SQU5LPjxTQ0hFRF9SRVFVSVJFTUVOVFM+PCFbQ0RBVEFbQ0xVU1RFUl9JRD0iMTAwIl1dPjwvU0NIRURfUkVRVUlSRU1FTlRTPjwvVVNFUl9URU1QTEFURT48VEVNUExBVEU+PFNFQ1VSSVRZX0dST1VQX1JVTEU+PFBST1RPQ09MPjwhW0NEQVRBW0FMTF1dPjwvUFJPVE9DT0w+PFJVTEVfVFlQRT48IVtDREFUQVtPVVRCT1VORF1dPjwvUlVMRV9UWVBFPjxTRUNVUklUWV9HUk9VUF9JRD48IVtDREFUQVswXV0+PC9TRUNVUklUWV9HUk9VUF9JRD48U0VDVVJJVFlfR1JPVVBfTkFNRT48IVtDREFUQVtkZWZhdWx0XV0+PC9TRUNVUklUWV9HUk9VUF9OQU1FPjwvU0VDVVJJVFlfR1JPVVBfUlVMRT48L1RFTVBMQVRFPjxURU1QTEFURT48U0VDVVJJVFlfR1JPVVBfUlVMRT48UFJPVE9DT0w+PCFbQ0RBVEFbQUxMXV0+PC9QUk9UT0NPTD48UlVMRV9UWVBFPjwhW0NEQVRBW0lOQk9VTkRdXT48L1JVTEVfVFlQRT48U0VDVVJJVFlfR1JPVVBfSUQ+PCFbQ0RBVEFbMF1dPjwvU0VDVVJJVFlfR1JPVVBfSUQ+PFNFQ1VSSVRZX0dST1VQX05BTUU+PCFbQ0RBVEFbZGVmYXVsdF1dPjwvU0VDVVJJVFlfR1JPVVBfTkFNRT48L1NFQ1VSSVRZX0dST1VQX1JVTEU+PC9URU1QTEFURT48SElTVE9SWV9SRUNPUkRTPjxISVNUT1JZPjxIT1NUTkFNRT4xMC4wLjIyLjU8L0hPU1ROQU1FPjwvSElTVE9SWT48L0hJU1RPUllfUkVDT1JEUz48SElTVE9SWV9SRUNPUkRTPjxISVNUT1JZPjxWTV9NQUQ+PCFbQ0RBVEFbbHhkXV0+PC9WTV9NQUQ+PC9ISVNUT1JZPjwvSElTVE9SWV9SRUNPUkRTPjxURU1QTEFURT48TklDPjxBUl9JRD48IVtDREFUQVswXV0+PC9BUl9JRD48QlJJREdFPjwhW0NEQVRBW29wZW5ici5wcml2YXRlXV0+PC9CUklER0U+PEJSSURHRV9UWVBFPjwhW0NEQVRBW2xpbnV4XV0+PC9CUklER0VfVFlQRT48Q0xVU1RFUl9JRD48IVtDREFUQVswLDEwMF1dPjwvQ0xVU1RFUl9JRD48SVA+PCFbQ0RBVEFbMTAuMC43NC4xXV0+PC9JUD48TUFDPjwhW0NEQVRBWzAyOjAwOjBhOjAwOjRhOjAxXV0+PC9NQUM+PE5BTUU+PCFbQ0RBVEFbTklDMF1dPjwvTkFNRT48TkVUV09SSz48IVtDREFUQVtwcml2YXRlXV0+PC9ORVRXT1JLPjxORVRXT1JLX0lEPjwhW0NEQVRBWzBdXT48L05FVFdPUktfSUQ+PE5JQ19JRD48IVtDREFUQVswXV0+PC9OSUNfSUQ+PFNFQ1VSSVRZX0dST1VQUz48IVtDREFUQVswXV0+PC9TRUNVUklUWV9HUk9VUFM+PFRBUkdFVD48IVtDREFUQVtvbmUtNzYtMF1dPjwvVEFSR0VUPjxWTl9NQUQ+PCFbQ0RBVEFbZnddXT48L1ZOX01BRD48L05JQz48L1RFTVBMQVRFPjwvVk0+ 'one-76' Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: iptables v1.6.1: interface name --physdev-is-bridged' must be shorter than IFNAMSIZ (15) Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: Tryiptables -h' or 'iptables --help' for more information. Fri Aug 16 11:45:16 2019 [Z0][VMM][E]: post: Command Error: sudo iptables -I opennebula -m physdev --physdev-out --physdev-is-bridged -j one-76-0-i Fri Aug 16 11:45:16 2019 [Z0][VMM][E]: post: ["/var/tmp/one/vnm/command.rb:62:in block in run!'", "/var/tmp/one/vnm/command.rb:59:ineach'", "/var/tmp/one/vnm/command.rb:59:in run!'", "/var/tmp/one/vnm/security_groups_iptables.rb:485:innic_pre'", "/var/tmp/one/vnm/sg_driver.rb:92:in block in activate'", "/var/tmp/one/vnm/vnm_driver.rb:79:inblock in process'", "/var/tmp/one/vnm/vm.rb:64:in block in each_nic'", "/var/tmp/one/vnm/vm.rb:63:ineach'", "/var/tmp/one/vnm/vm.rb:63:in each_nic'", "/var/tmp/one/vnm/vnm_driver.rb:82:inprocess'", "/var/tmp/one/vnm/sg_driver.rb:84:in activate'", "/var/tmp/one/vnm/fw/post:32:in

'"] Fri Aug 16 11:45:16 2019 [Z0][VMM][I]: ExitCode: 1 Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: shutdown: --- Stopping container --- Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: shutdown: Processing disk 0 Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: shutdown: Using rbd disk mapper for Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: shutdown: Unmapping disk at /var/snap/lxd/common/lxd/storage-pools/default/containers/one-76/rootfs Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: shutdown: Umounting disk mapped at /dev/nbd1 Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: shutdown: Unmapping disk at /var/lib/one/datastores/101/76/mapper/disk.1 Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: shutdown: Umounting disk mapped at /dev/loop0 Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: ExitCode: 0 Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: Successfully execute virtualization driver operation: cancel. Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: Failed to execute network driver operation: post. Fri Aug 16 11:45:17 2019 [Z0][VMM][E]: Error deploying virtual machine: fw: - Fri Aug 16 11:45:17 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

Progress Status

  • [ ] Branch created
  • [ ] Code committed to development branch
  • [ ] Testing - QA
  • [ ] Documentation
  • [ ] Release notes - resolved issues, compatibility, known issues
  • [ ] Code committed to upstream release/hotfix branches
  • [ ] Documentation committed to upstream release/hotfix branches
alitvak69 commented 4 years ago

@dann1 I see you assigned yourself. If any additional information is needed please let me know.

alitvak69 commented 4 years ago

It is 5.8.4, the issue started to manifest in 5.8.3, update to 5.8.4 didn't help.

On Mon, Aug 19, 2019 at 12:21 PM Daniel Clavijo Coca < notifications@github.com> wrote:

Hi @alitvak69 https://github.com/alitvak69, could you clarify what version is this opennebula setup ?

Issue says it's 5.8.0, but this line

Fri Aug 16 11:45:17 2019 [Z0][VMM][I]: shutdown: Processing disk 0

Was added 2 months ago

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenNebula/one/issues/3596?email_source=notifications&email_token=ABWJN3EI57VFTBBWBKK4SG3QFLJATA5CNFSM4IML5YN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4TVZYY#issuecomment-522673379, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWJN3HWV7Y42YQREVAPF7DQFLJATANCNFSM4IML5YNQ .

dann1 commented 4 years ago

Yes sorry, I read 5.8.0, I don't know what I was looking at. I'm gonna try and reproduce it.

alitvak69 commented 4 years ago

@dann1 Any luck in reproducing it? None of my LXDs are starting for this reason and I am not sure what could cause it.

dann1 commented 4 years ago

Hi @alitvak69, I cannot reproduce the issue, here is the centos7 based LXD VM template with an fw net

oneadmin@ubuntu1804-lxd-qcow2-f34f2-0:~/readiness$ onevm show -x 20
<VM>
  <ID>20</ID>
  <UID>0</UID>
  <GID>0</GID>
  <UNAME>oneadmin</UNAME>
  <GNAME>oneadmin</GNAME>
  <NAME>alitvak69-20</NAME>
  <PERMISSIONS>
    <OWNER_U>1</OWNER_U>
    <OWNER_M>1</OWNER_M>
    <OWNER_A>0</OWNER_A>
    <GROUP_U>0</GROUP_U>
    <GROUP_M>0</GROUP_M>
    <GROUP_A>0</GROUP_A>
    <OTHER_U>0</OTHER_U>
    <OTHER_M>0</OTHER_M>
    <OTHER_A>0</OTHER_A>
  </PERMISSIONS>
  <LAST_POLL>1566365074</LAST_POLL>
  <STATE>3</STATE>
  <LCM_STATE>3</LCM_STATE>
  <PREV_STATE>3</PREV_STATE>
  <PREV_LCM_STATE>3</PREV_LCM_STATE>
  <RESCHED>0</RESCHED>
  <STIME>1566364996</STIME>
  <ETIME>0</ETIME>
  <DEPLOY_ID>one-20</DEPLOY_ID>
  <MONITORING>
    <CPU><![CDATA[0.0]]></CPU>
    <MEMORY><![CDATA[0]]></MEMORY>
    <NETRX><![CDATA[726]]></NETRX>
    <NETTX><![CDATA[726]]></NETTX>
    <STATE><![CDATA[a]]></STATE>
  </MONITORING>
  <TEMPLATE>
    <AUTOMATIC_DS_REQUIREMENTS><![CDATA[("CLUSTERS/ID" @> 0)]]></AUTOMATIC_DS_REQUIREMENTS>
    <AUTOMATIC_NIC_REQUIREMENTS><![CDATA[("CLUSTERS/ID" @> 0)]]></AUTOMATIC_NIC_REQUIREMENTS>
    <AUTOMATIC_REQUIREMENTS><![CDATA[(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES)]]></AUTOMATIC_REQUIREMENTS>
    <CONTEXT>
      <DISK_ID><![CDATA[1]]></DISK_ID>
      <ETH0_CONTEXT_FORCE_IPV4><![CDATA[]]></ETH0_CONTEXT_FORCE_IPV4>
      <ETH0_DNS><![CDATA[]]></ETH0_DNS>
      <ETH0_EXTERNAL><![CDATA[]]></ETH0_EXTERNAL>
      <ETH0_GATEWAY><![CDATA[]]></ETH0_GATEWAY>
      <ETH0_GATEWAY6><![CDATA[]]></ETH0_GATEWAY6>
      <ETH0_IP><![CDATA[192.168.160.100]]></ETH0_IP>
      <ETH0_IP6><![CDATA[]]></ETH0_IP6>
      <ETH0_IP6_PREFIX_LENGTH><![CDATA[]]></ETH0_IP6_PREFIX_LENGTH>
      <ETH0_IP6_ULA><![CDATA[]]></ETH0_IP6_ULA>
      <ETH0_MAC><![CDATA[02:00:c0:a8:a0:64]]></ETH0_MAC>
      <ETH0_MASK><![CDATA[]]></ETH0_MASK>
      <ETH0_MTU><![CDATA[]]></ETH0_MTU>
      <ETH0_NETWORK><![CDATA[]]></ETH0_NETWORK>
      <ETH0_SEARCH_DOMAIN><![CDATA[]]></ETH0_SEARCH_DOMAIN>
      <ETH0_VLAN_ID><![CDATA[]]></ETH0_VLAN_ID>
      <ETH0_VROUTER_IP><![CDATA[]]></ETH0_VROUTER_IP>
      <ETH0_VROUTER_IP6><![CDATA[]]></ETH0_VROUTER_IP6>
      <ETH0_VROUTER_MANAGEMENT><![CDATA[]]></ETH0_VROUTER_MANAGEMENT>
      <NETWORK><![CDATA[YES]]></NETWORK>
      <SET_HOSTNAME><![CDATA[alitvak69-20]]></SET_HOSTNAME>
      <SSH_PUBLIC_KEY><![CDATA[ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCYz+lkZoNyspRhrtXDKFN3cIEwN3w08mz0YGKpVDIiV0+/vgG8dAUQ70Irs3m83W9BHN+vNjKPgKcF+X+sSfxniOtavahxGCRjAhhs1IVm196C5ODbSgXVUWULdtmMHelXbLBJ8X340h/UO+eQ6eRLaRfslXUsgRqremVcvCCPz4LIuRiliGWiELAmqYcY+1zJLeg3QV2Pgn5vschM9e/A4AseKO+HnbGB/I5tnoeZT/Gc3FGfUZLNFVB2XsVGAEEzkqO8VI2msB7MCAZBHffIK6WfLIYgGP6Ha2JT1NWJU7Ncj9Xuql0ElF01VwWMDWzqc0DOiVSsTL89ugJKU6+h one]]></SSH_PUBLIC_KEY>
      <TARGET><![CDATA[hdb]]></TARGET>
    </CONTEXT>
    <CPU><![CDATA[1]]></CPU>
    <DISK>
      <ALLOW_ORPHANS><![CDATA[NO]]></ALLOW_ORPHANS>
      <CLONE><![CDATA[YES]]></CLONE>
      <CLONE_TARGET><![CDATA[SYSTEM]]></CLONE_TARGET>
      <CLUSTER_ID><![CDATA[0]]></CLUSTER_ID>
      <DATASTORE><![CDATA[default]]></DATASTORE>
      <DATASTORE_ID><![CDATA[1]]></DATASTORE_ID>
      <DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX>
      <DISK_ID><![CDATA[0]]></DISK_ID>
      <DISK_SNAPSHOT_TOTAL_SIZE><![CDATA[0]]></DISK_SNAPSHOT_TOTAL_SIZE>
      <DISK_TYPE><![CDATA[FILE]]></DISK_TYPE>
      <DRIVER><![CDATA[qcow2]]></DRIVER>
      <IMAGE><![CDATA[centos_7 - LXD]]></IMAGE>
      <IMAGE_ID><![CDATA[4]]></IMAGE_ID>
      <IMAGE_STATE><![CDATA[9]]></IMAGE_STATE>
      <LN_TARGET><![CDATA[NONE]]></LN_TARGET>
      <ORIGINAL_SIZE><![CDATA[1024]]></ORIGINAL_SIZE>
      <READONLY><![CDATA[NO]]></READONLY>
      <SAVE><![CDATA[NO]]></SAVE>
      <SIZE><![CDATA[1024]]></SIZE>
      <SOURCE><![CDATA[/var/lib/one//datastores/1/8813f27f47e4396104b474b395c9776c]]></SOURCE>
      <TARGET><![CDATA[hda]]></TARGET>
      <TM_MAD><![CDATA[qcow2]]></TM_MAD>
      <TYPE><![CDATA[FILE]]></TYPE>
    </DISK>
    <GRAPHICS>
      <LISTEN><![CDATA[0.0.0.0]]></LISTEN>
      <PORT><![CDATA[5920]]></PORT>
      <TYPE><![CDATA[VNC]]></TYPE>
    </GRAPHICS>
    <MEMORY><![CDATA[768]]></MEMORY>
    <NIC>
      <AR_ID><![CDATA[0]]></AR_ID>
      <BRIDGE><![CDATA[br0]]></BRIDGE>
      <BRIDGE_TYPE><![CDATA[linux]]></BRIDGE_TYPE>
      <CLUSTER_ID><![CDATA[0]]></CLUSTER_ID>
      <FILTER_IP_SPOOFING><![CDATA[YES]]></FILTER_IP_SPOOFING>
      <FILTER_MAC_SPOOFING><![CDATA[YES]]></FILTER_MAC_SPOOFING>
      <IP><![CDATA[192.168.160.100]]></IP>
      <MAC><![CDATA[02:00:c0:a8:a0:64]]></MAC>
      <NAME><![CDATA[NIC0]]></NAME>
      <NETWORK><![CDATA[sg-1]]></NETWORK>
      <NETWORK_ID><![CDATA[1]]></NETWORK_ID>
      <NIC_ID><![CDATA[0]]></NIC_ID>
      <SECURITY_GROUPS><![CDATA[0]]></SECURITY_GROUPS>
      <TARGET><![CDATA[one-20-0]]></TARGET>
      <VN_MAD><![CDATA[fw]]></VN_MAD>
    </NIC>
    <OS>
      <BOOT><![CDATA[]]></BOOT>
    </OS>
    <SECURITY_GROUP_RULE>
      <PROTOCOL><![CDATA[ALL]]></PROTOCOL>
      <RULE_TYPE><![CDATA[OUTBOUND]]></RULE_TYPE>
      <SECURITY_GROUP_ID><![CDATA[0]]></SECURITY_GROUP_ID>
      <SECURITY_GROUP_NAME><![CDATA[default]]></SECURITY_GROUP_NAME>
    </SECURITY_GROUP_RULE>
    <SECURITY_GROUP_RULE>
      <PROTOCOL><![CDATA[ALL]]></PROTOCOL>
      <RULE_TYPE><![CDATA[INBOUND]]></RULE_TYPE>
      <SECURITY_GROUP_ID><![CDATA[0]]></SECURITY_GROUP_ID>
      <SECURITY_GROUP_NAME><![CDATA[default]]></SECURITY_GROUP_NAME>
    </SECURITY_GROUP_RULE>
    <TEMPLATE_ID><![CDATA[6]]></TEMPLATE_ID>
    <TM_MAD_SYSTEM><![CDATA[qcow2]]></TM_MAD_SYSTEM>
    <VMID><![CDATA[20]]></VMID>
  </TEMPLATE>
  <USER_TEMPLATE>
    <HYPERVISOR><![CDATA[lxd]]></HYPERVISOR>
    <INPUTS_ORDER><![CDATA[]]></INPUTS_ORDER>
    <LXD_PROFILE><![CDATA[]]></LXD_PROFILE>
    <LXD_SECURITY_NESTING><![CDATA[no]]></LXD_SECURITY_NESTING>
    <LXD_SECURITY_PRIVILEGED><![CDATA[]]></LXD_SECURITY_PRIVILEGED>
    <MEMORY_UNIT_COST><![CDATA[MB]]></MEMORY_UNIT_COST>
  </USER_TEMPLATE>
  <HISTORY_RECORDS>
    <HISTORY>
      <OID>20</OID>
      <SEQ>0</SEQ>
      <HOSTNAME>ubuntu1804-lxd-qcow2-f34f2-2.test</HOSTNAME>
      <HID>1</HID>
      <CID>0</CID>
      <STIME>1566365022</STIME>
      <ETIME>0</ETIME>
      <VM_MAD><![CDATA[lxd]]></VM_MAD>
      <TM_MAD><![CDATA[qcow2]]></TM_MAD>
      <DS_ID>0</DS_ID>
      <PSTIME>1566365022</PSTIME>
      <PETIME>1566365025</PETIME>
      <RSTIME>1566365025</RSTIME>
      <RETIME>0</RETIME>
      <ESTIME>0</ESTIME>
      <EETIME>0</EETIME>
      <ACTION>0</ACTION>
      <UID>-1</UID>
      <GID>-1</GID>
      <REQUEST_ID>-1</REQUEST_ID>
    </HISTORY>
  </HISTORY_RECORDS>
</VM>

The VM runs

oneadmin@ubuntu1804-lxd-qcow2-f34f2-0:~/readiness$ onevm list
    ID USER     GROUP    NAME            STAT UCPU    UMEM HOST             TIME
    20 oneadmin oneadmin alitvak69-20    runn  0.0      0K ubuntu1804   0d 00h02

The part of the driver that must be failing on your setup is the one related to getting the container nic of the veth pair located on the host.

https://github.com/OpenNebula/one/blob/0c1550a90ef979155d9877b6b674eea5dbe85607/src/vnm_mad/remotes/lib/nic.rb#L107-L147

Maybe you can debug something over there. Are you using snapd or apt based setup ?

alitvak69 commented 4 years ago

I am using snapd based setup, but the problem started after apt-get upgrade was run. I updated ceph-common to 14.2.2 during to cluster update, which caused rbd-nbd to die with segfault. Thinking something else was not updated, I ran apt-get update && apt-get upgrade. The rbd-nbd started running fine but this problem appeared. Do you think that update overwrought something and perhaps I need to reinstall lxd with snap or apt-get? The thing is, I already removed and installed all relevant opennebula debs during 5.8.3 to 5.8.4 update.

alitvak69 commented 4 years ago

In any case, how do you enable debug so it prints more than the log already have?

dann1 commented 4 years ago

I don't think ceph has any role over this issue. The issue here is reading the value of the NIC veth pair on the host from the container configuration. Long ago we had an issue with this particular situation where the value couldn't be read due to snapd-based setup requiring sudo to run the command.

Update, in the LXD host, the file /var/tmp/one/vnm/nic.rb with the changes seen here.

alitvak69 commented 4 years ago

I guess my story got too long. While ceph is not the issue general systemwide package update could break something. In any case I will add those lines for debugging and report.

dann1 commented 4 years ago

That is possible, also keep in mind that we recommend avoiding snapd-based setups if possible.

alitvak69 commented 4 years ago

Should I do it via apt-get? Is there a document for that?

dann1 commented 4 years ago

I've updated the changes, the 2nd e was wrong

dann1 commented 4 years ago

http://docs.opennebula.org/5.8/deployment/node_installation/lxd_node_installation.html#installing-on-ubuntu

dann1 commented 4 years ago

Confirmed with a snap based setup, very important when reporting LXD related bugs to state whether it's snap based or apt based environment. The OS as well.

dann1 commented 4 years ago

The snapd got updated and the output it gave changed, code relied on it. I've updated it. Check https://github.com/OpenNebula/one/pull/3605/files if you want to apply the patch.

alitvak69 commented 4 years ago

Your patch has worked. I have LXD running again. However, and I apologize for off-topic, snap changed probably affected noVNC console as well. If you want me to open a separate issue I will, logs below, however. Console pops open, just black screen, and then crashes.

10.10.10.50 - - [21/Aug/2019 20:11:53] 10.10.10.50: Plain non-SSL (ws://) WebSocket connection 10.10.10.50 - - [21/Aug/2019 20:11:53] 10.10.10.50: Version hybi-13, base64: 'False' 10.10.10.50 - - [21/Aug/2019 20:11:53] 10.10.10.50: Path: '/?host=virt3n1-la.xcastlabs.net&port=29876&token=j73hpfow0sk1dljh8hk3&encrypt=no&title=alex-lxd-centos7&password=null' 10.10.10.50 - - [21/Aug/2019 20:11:53] connecting to: 10.0.22.5:5979 10.10.10.50 - - [21/Aug/2019 20:11:58] code 400, message Bad HTTP/0.9 request type ('\x82\x88-') 10.10.10.50 - - [21/Aug/2019 20:12:03] 10.10.10.50: Plain non-SSL (ws://) WebSocket connection 10.10.10.50 - - [21/Aug/2019 20:12:03] 10.10.10.50: Version hybi-13, base64: 'False' 10.10.10.50 - - [21/Aug/2019 20:12:03] 10.10.10.50: Path: '/?host=virt3n1-la.xcastlabs.net&port=29876&token=j73hpfow0sk1dljh8hk3&encrypt=no' 10.10.10.50 - - [21/Aug/2019 20:12:03] connecting to: 10.0.22.5:5979 10.10.10.50 - - [21/Aug/2019 20:12:11] code 400, message Bad request syntax ('\x88\x8f\\xcb\x8aC_#\xde".\xac\xef7|\xa8\xe6,/\xae\xee') 10.10.10.50 - - [21/Aug/2019 20:26:25] 10.10.10.50: Plain non-SSL (ws://) WebSocket connection 10.10.10.50 - - [21/Aug/2019 20:26:25] 10.10.10.50: Version hybi-13, base64: 'False' 10.10.10.50 - - [21/Aug/2019 20:26:25] 10.10.10.50: Path: '/?host=virt3n1-la.xcastlabs.net&port=29876&token=vdgmbljvtxjgc8pfprcv&encrypt=no&title=alex-lxd-centos7&password=null' 10.10.10.50 - - [21/Aug/2019 20:26:25] connecting to: 10.0.22.5:5979 10.10.10.50 - - [21/Aug/2019 20:26:33] code 400, message Bad request syntax ("\x88\x8f'\x88\x18\x03$`LbU\xef}w\x07\xebtlT\xed|") 10.10.10.50 - - [21/Aug/2019 20:28:15] 10.10.10.50: Plain non-SSL (ws://) WebSocket connection 10.10.10.50 - - [21/Aug/2019 20:28:15] 10.10.10.50: Version hybi-13, base64: 'False' 10.10.10.50 - - [21/Aug/2019 20:28:15] 10.10.10.50: Path: '/?host=virt3n1-la.xcastlabs.net&port=29876&token=tuc81dvh73kcxhqkl57y&encrypt=no&title=alex-lxd-centos7&password=null' 10.10.10.50 - - [21/Aug/2019 20:28:15] connecting to: 10.0.22.5:5980 10.10.10.50 - - [21/Aug/2019 20:28:20] code 400, message Bad request syntax ('\x88\x8f2[2k1\xb3f')

dann1 commented 4 years ago

Yes, feel free to open another issue,however, although svncterm execution is somewhat dependent on snap, this situation with the output changing shouldn't affect due to the check being based on the lxd socket location, and not the CLI output.

alitvak69 commented 4 years ago

Could your patch affect KVM interface naming? I had VMs operating with default eth0 and eth1 virtual nics. After installing the patch and redeploying my instances coming up with ens3 and ens4 interfaces. It seems that virtio nic is no longer default and virtio_net is not loaded on boot.

alitvak69 commented 4 years ago

Never mind my last message. The upgrade overwrote my custom vmm_exec_kvm file making network cards RTC-Link instead of virtio by default.

dann1 commented 4 years ago

Remember to update the source file in the frontend so when you add new hosts the will get the patch as well. The file is located in /var/lib/one/remotes/vnm/nic.rb in the frontend.