OpenNebula / one

The open source Cloud & Edge Computing Platform bringing real freedom to your Enterprise Cloud 🚀
http://opennebula.io
Apache License 2.0
1.26k stars 484 forks source link

Failed to deploy containers on ceph storage #2998

Closed tkald closed 5 years ago

tkald commented 5 years ago

Description When I try to import LXD image from LinuxContainers.org marketplace, disk creation to ceph datastore fails.

Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: cp: Copying local image "lxd://https://images.linuxcontainers.org/images/ubuntu/xenial/amd64/default/./20190227_08:06$
Wed Feb 27 15:33:20 2019 [Z0][ImM][E]: cp: Command "    set -e -o pipefail
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]:
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: FORMAT=$(qemu-img info /var/tmp/b857d240deee8b8b421b172755a19321 | grep "^file format:" | awk '{print $3}' || :)
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]:
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: if [ "$FORMAT" != "raw" ]; then
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: qemu-img convert -O raw /var/tmp/b857d240deee8b8b421b172755a19321 /var/tmp/b857d240deee8b8b421b172755a19321.raw
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: mv /var/tmp/b857d240deee8b8b421b172755a19321.raw /var/tmp/b857d240deee8b8b421b172755a19321
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: fi
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]:
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: rbd --id libvirt import --image-format 2 /var/tmp/b857d240deee8b8b421b172755a19321 one/one-597
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]:
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: # remove original
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: rm -f /var/tmp/b857d240deee8b8b421b172755a19321" failed: rbd: unable to get size of file/block device
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: Importing image: 0% complete...failed.
Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: rbd: import failed: (25) Inappropriate ioctl for device
Wed Feb 27 15:33:20 2019 [Z0][ImM][E]: Error registering one/one-597 in r620-4
Wed Feb 27 15:33:20 2019 [Z0][ImM][E]: Error copying image in the datastore: Error registering one/one-597 in r620-4

To Reproduce Setup LXD node and Linuxcontainer marketplace, try to import container to ceph datastore.

Expected behavior Container image is created to ceph datastore

Details

Additional context I did install LXD node package to existing KVM node with ceph backend and reimported host with alternative name as LXD node:

echo "deb http://archive.ubuntu.com/ubuntu $(lsb_release -cs)-backports main restricted universe multiverse" >> /etc/apt/sources.list
apt update
apt install lxd=3.0.3-0ubuntu1~16.04.1 liblxc1=3.0.3-0ubuntu1~16.04.1 lxcfs=3.0.3-0ubuntu1~16.04.1 lxd-client=3.0.3-0ubuntu1~16.04.1 -y
apt-get install opennebula-node-lxd rbd-nbd -y

Ceph datastore info:

onedatastore show 101
DATASTORE 101 INFORMATION
ID             : 101
NAME           : cephds
USER           : oneadmin
GROUP          : oneadmin
CLUSTERS       : 0
TYPE           : IMAGE
DS_MAD         : ceph
TM_MAD         : ceph
BASE PATH      : /var/lib/one//datastores/101
DISK_TYPE      : RBD
STATE          : READY

DATASTORE CAPACITY
TOTAL:         : 11.4T
FREE:          : 5.1T
USED:          : 6.3T
LIMIT:         : -

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
ALLOW_ORPHANS="YES"
BRIDGE_LIST="r620-5 r620-4"
CEPH_HOST="192.168.30.71:6789 192.168.30.72:6789 192.168.30.73:6789"
CEPH_SECRET="********"
CEPH_USER="libvirt"
CLONE_TARGET="SELF"
DATASTORE_CAPACITY_CHECK="YES"
DISK_TYPE="RBD"
DRIVER="raw"
DS_MAD="ceph"
LABELS="Ceph,HDD"
LN_TARGET="NONE"
POOL_NAME="one"
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp /tmp /mnt/ceph/appmarket"
TM_MAD="ceph"
TYPE="IMAGE_DS"

Progress Status

tkald commented 5 years ago

Also _vmm_exec/vmm_execlxd.conf referenced in oned.conf is not present in any Ubuntu Xenial package.

dann1 commented 5 years ago

Hello, I'm unable to reproduce the issue, the marketplace works fine for me, here is the ceph datastore info

DATASTORE 1 INFORMATION                                                         
ID             : 1                   
NAME           : default             
USER           : oneadmin            
GROUP          : oneadmin            
CLUSTERS       : 0                   
TYPE           : IMAGE               
DS_MAD         : ceph                
TM_MAD         : ceph                
BASE PATH      : /var/lib/one//datastores/1
DISK_TYPE      : RBD                 
STATE          : READY               

DATASTORE CAPACITY                                                              
TOTAL:         : 36.9G               
FREE:          : 36G                 
USED:          : 997M                
LIMIT:         : -                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : u--                 
OTHER          : ---                 

DATASTORE TEMPLATE                                                              
ALLOW_ORPHANS="mixed"
BRIDGE_LIST="ubuntu1604-lxd-ceph-luminous-19a69-1.test ubuntu1604-lxd-ceph-luminous-19a69-2.test"
CEPH_HOST="ubuntu1604-lxd-ceph-luminous-19a69-0.test"
CEPH_SECRET="7ebb2445-e96e-44c6-b7c7-07dc7a50f311"
CEPH_USER="oneadmin"
CLONE_TARGET="SELF"
CLONE_TARGET_SHARED="SELF"
CLONE_TARGET_SSH="SYSTEM"
DISK_TYPE="RBD"
DISK_TYPE_SHARED="RBD"
DISK_TYPE_SSH="FILE"
DRIVER="raw"
DS_MAD="ceph"
LN_TARGET="NONE"
LN_TARGET_SHARED="NONE"
LN_TARGET_SSH="SYSTEM"
POOL_NAME="one"
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp /tmp"
TM_MAD="ceph"
TM_MAD_SYSTEM="ssh,shared"
TYPE="IMAGE_DS"

IMAGES         
0              
1              
2

Image 2 is from LXD marketplace

oneadmin@ubuntu1604-lxd-ceph-luminous-19a69-0:/root$ oneimage show 2
IMAGE 2 INFORMATION                                                             
ID             : 2                   
NAME           : ubuntu_xenial - LXD 
USER           : oneadmin            
GROUP          : oneadmin            
LOCK           : None                
DATASTORE      : default             
TYPE           : OS                  
REGISTER TIME  : 02/27 15:36:50      
PERSISTENT     : No                  
SOURCE         : one/one-2           
PATH           : "lxd://https://images.linuxcontainers.org/images/ubuntu/xenial/amd64/default/./20190227_08:06/rootfs.tar.xz?size=1024&filesystem=ext4&format=raw"
SIZE           : 1024M               
STATE          : rdy                 
RUNNING_VMS    : 0                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : ---                 
OTHER          : ---                 

IMAGE TEMPLATE                                                                  
DEV_PREFIX="hdoneadmin@ubuntu1604-lxd-ceph-luminous-19a69-0:/root$ oneimage show 2
IMAGE 2 INFORMATION                                                             
ID             : 2                   
NAME           : ubuntu_xenial - LXD 
USER           : oneadmin            
GROUP          : oneadmin            
LOCK           : None                
DATASTORE      : default             
TYPE           : OS                  
REGISTER TIME  : 02/27 15:36:50      
PERSISTENT     : No                  
SOURCE         : one/one-2           
PATH           : "lxd://https://images.linuxcontainers.org/images/ubuntu/xenial/amd64/default/./20190227_08:06/rootfs.tar.xz?size=1024&filesystem=ext4&format=raw"
SIZE           : 1024M               
STATE          : rdy                 
RUNNING_VMS    : 0                   

PERMISSIONS                                                                     
OWNER          : um-                 
GROUP          : ---                 
OTHER          : ---                 

IMAGE TEMPLATE                                                                  
DEV_PREFIX="hd"
FORMAT="raw"
FROM_APP="65"
FROM_APP_MD5=""
FROM_APP_NAME="ubuntu_xenial - LXD"

VIRTUAL MACHINES"
FORMAT="raw"
FROM_APP="65"
FROM_APP_MD5=""
FROM_APP_NAME="ubuntu_xenial - LXD"

VIRTUAL MACHINES

The marketplace works agnostic to datastore and opennebula nodes.

Also _vmm_exec/vmm_execlxd.conf referenced in oned.conf is not present in any Ubuntu Xenial package.

It is commented by default

dann1 commented 5 years ago

Wed Feb 27 15:33:20 2019 [Z0][ImM][I]: rm -f /var/tmp/b857d240deee8b8b421b172755a19321" failed: rbd: unable to get size of file/block device

Can you check if you have enough space in /var/tmp ?

tkald commented 5 years ago

Can you check if you have enough space in /var/tmp ?

Yes, plenty of space. One more thing I discovered that might be related - when I download container template to my own PC from linuxcontainer marketplace, I get zero byte file.

tkald commented 5 years ago

1st line in error log: Copying local image "lxd://https://images.linuxcontainers.org/images/ubuntu/xenial/amd64/default/./20190227_08:06..."

Just wondering - does this url really need to start with lxd:// prefix?

Also saved image is zero byte file:

ls -al /var/tmp/b857d240deee8b8b421b172755a19321
-rw-rw-r-- 1 oneadmin oneadmin 0 Feb 27 15:33 /var/tmp/b857d240deee8b8b421b172755a19321

That's the reason rbd can't export it to the datastore.

tkald commented 5 years ago

Seems that something more is broken on my cloud - can't boot custom lxd image either:

Thu Feb 28 01:54:31 2019 [Z0][VM][I]: New state is ACTIVE
Thu Feb 28 01:54:31 2019 [Z0][VM][I]: New LCM state is PROLOG
Thu Feb 28 01:54:34 2019 [Z0][VM][I]: New LCM state is BOOT
Thu Feb 28 01:54:34 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/3463/deployment.0
Thu Feb 28 01:54:35 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Thu Feb 28 01:54:35 2019 [Z0][VMM][I]: ExitCode: 0
Thu Feb 28 01:54:35 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Thu Feb 28 01:54:36 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/100/3463/deployment.0' 'vml-vm1-lxd' 3463 vml-vm1-lxd
Thu Feb 28 01:54:36 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Thu Feb 28 01:54:36 2019 [Z0][VMM][E]: deploy: do_map: rbd-nbd: unknown command: --id
Thu Feb 28 01:54:36 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/lxd/storage-pools/default/containers/one-3463/rootfs using device
Thu Feb 28 01:54:36 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/deploy:72:in `<main>': failed to setup container storage (RuntimeError)
Thu Feb 28 01:54:36 2019 [Z0][VMM][I]: ExitCode: 1
Thu Feb 28 01:54:36 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Thu Feb 28 01:54:36 2019 [Z0][VMM][E]: Error deploying virtual machine
Thu Feb 28 01:54:36 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

There's no /var/lib/one//datastores/100/3463/deployment.0 file on lxd node, only disk.1 content file on that folder.

dann1 commented 5 years ago

The LXD drivers rbd backend uses rbd-nbd command, the ceph image is mapped into the opennebula node by issuing:

sudo rbd-nbd --id $CEPH_USER map $SOURCE

These variables are located under the DISK section in the deployment xml. Can you run the command manually and check it works fine ?

This is the line failing, it seems to be recognizing --id as a command instead of an argument. The command is loaded from here

There's no /var/lib/one//datastores/100/3463/deployment.0 file on lxd node, only disk.1 content file on that folder

Yes, the deployment xml is saved instead in a user.xml key under the container config definition.

root@ubuntu1604-lxd-ceph-luminous-19a69-2:~# lxc config show one-0 
architecture: x86_64
config:
  limits.cpu.allowance: 100%
  limits.memory: 768MB
  security.nesting: "false"
  security.privileged: "true"
  user.xml: |
    <VM><ID>0</ID><UID>0</UID><GID>0</GID><UNAME>oneadmin</UNAME><GNAME>oneadmin</GNAME><NAME>ubuntu_xenial - LXD-0</NAME><PERMISSIONS><OWNER_U>1</OWNER_U><OWNER_M>1</OWNER_M><OWNER_A>0</OWNER_A><GROUP_U>0</GROUP_U><GROUP_M>0</GROUP_M><GROUP_A>0</GROUP_A><OTHER_U>0</OTHER_U><OTHER_M>0</OTHER_M><OTHER_A>0</OTHER_A></PERMISSIONS><LAST_POLL>0</LAST_POLL><STATE>3</STATE><LCM_STATE>2</LCM_STATE><PREV_STATE>3</PREV_STATE><PREV_LCM_STATE>2</PREV_LCM_STATE><RESCHED>0</RESCHED><STIME>1551316052</STIME><ETIME>0</ETIME><DEPLOY_ID></DEPLOY_ID><MONITORING></MONITORING><TEMPLATE><AUTOMATIC_DS_REQUIREMENTS><![CDATA[("CLUSTERS/ID" @> 0)]]></AUTOMATIC_DS_REQUIREMENTS><AUTOMATIC_NIC_REQUIREMENTS><![CDATA[("CLUSTERS/ID" @> 0)]]></AUTOMATIC_NIC_REQUIREMENTS><AUTOMATIC_REQUIREMENTS><![CDATA[(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES)]]></AUTOMATIC_REQUIREMENTS><CONTEXT><DISK_ID><![CDATA[1]]></DISK_ID><NETWORK><![CDATA[YES]]></NETWORK><SET_HOSTNAME><![CDATA[ubuntu_xenial - LXD-0]]></SET_HOSTNAME><SSH_PUBLIC_KEY><![CDATA[ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCYz+lkZoNyspRhrtXDKFN3cIEwN3w08mz0YGKpVDIiV0+/vgG8dAUQ70Irs3m83W9BHN+vNjKPgKcF+X+sSfxniOtavahxGCRjAhhs1IVm196C5ODbSgXVUWULdtmMHelXbLBJ8X340h/UO+eQ6eRLaRfslXUsgRqremVcvCCPz4LIuRiliGWiELAmqYcY+1zJLeg3QV2Pgn5vschM9e/A4AseKO+HnbGB/I5tnoeZT/Gc3FGfUZLNFVB2XsVGAEEzkqO8VI2msB7MCAZBHffIK6WfLIYgGP6Ha2JT1NWJU7Ncj9Xuql0ElF01VwWMDWzqc0DOiVSsTL89ugJKU6+h one]]></SSH_PUBLIC_KEY><TARGET><![CDATA[hdb]]></TARGET></CONTEXT><CPU><![CDATA[1]]></CPU><DISK><ALLOW_ORPHANS><![CDATA[mixed]]></ALLOW_ORPHANS><CEPH_HOST><![CDATA[ubuntu1604-lxd-ceph-luminous-19a69-0.test]]></CEPH_HOST><CEPH_SECRET><![CDATA[7ebb2445-e96e-44c6-b7c7-07dc7a50f311]]></CEPH_SECRET><CEPH_USER><![CDATA[oneadmin]]></CEPH_USER><CLONE><![CDATA[YES]]></CLONE><CLONE_TARGET><![CDATA[SELF]]></CLONE_TARGET><CLUSTER_ID><![CDATA[0]]></CLUSTER_ID><DATASTORE><![CDATA[default]]></DATASTORE><DATASTORE_ID><![CDATA[1]]></DATASTORE_ID><DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX><DISK_ID><![CDATA[0]]></DISK_ID><DISK_SNAPSHOT_TOTAL_SIZE><![CDATA[0]]></DISK_SNAPSHOT_TOTAL_SIZE><DISK_TYPE><![CDATA[RBD]]></DISK_TYPE><DRIVER><![CDATA[raw]]></DRIVER><IMAGE><![CDATA[ubuntu_xenial - LXD]]></IMAGE><IMAGE_ID><![CDATA[2]]></IMAGE_ID><IMAGE_STATE><![CDATA[2]]></IMAGE_STATE><LN_TARGET><![CDATA[NONE]]></LN_TARGET><ORIGINAL_SIZE><![CDATA[1024]]></ORIGINAL_SIZE><POOL_NAME><![CDATA[one]]></POOL_NAME><READONLY><![CDATA[NO]]></READONLY><SAVE><![CDATA[NO]]></SAVE><SIZE><![CDATA[1024]]></SIZE><SOURCE><![CDATA[one/one-2]]></SOURCE><TARGET><![CDATA[hda]]></TARGET><TM_MAD><![CDATA[ceph]]></TM_MAD><TYPE><![CDATA[RBD]]></TYPE></DISK><GRAPHICS><LISTEN><![CDATA[0.0.0.0]]></LISTEN><PORT><![CDATA[5900]]></PORT><TYPE><![CDATA[vnc]]></TYPE></GRAPHICS><MEMORY><![CDATA[768]]></MEMORY><TEMPLATE_ID><![CDATA[2]]></TEMPLATE_ID><TM_MAD_SYSTEM><![CDATA[ceph]]></TM_MAD_SYSTEM><VMID><![CDATA[0]]></VMID></TEMPLATE><USER_TEMPLATE><LXD_SECURITY_PRIVILEGED><![CDATA[true]]></LXD_SECURITY_PRIVILEGED></USER_TEMPLATE><HISTORY_RECORDS><HISTORY><OID>0</OID><SEQ>0</SEQ><HOSTNAME>ubuntu1604-lxd-ceph-luminous-19a69-2.test</HOSTNAME><HID>1</HID><CID>0</CID><STIME>1551316053</STIME><ETIME>0</ETIME><VM_MAD><![CDATA[lxd]]></VM_MAD><TM_MAD><![CDATA[ceph]]></TM_MAD><DS_ID>0</DS_ID><PSTIME>1551316053</PSTIME><PETIME>1551316056</PETIME><RSTIME>1551316056</RSTIME><RETIME>0</RETIME><ESTIME>0</ESTIME><EETIME>0</EETIME><ACTION>0</ACTION><UID>-1</UID><GID>-1</GID><REQUEST_ID>-1</REQUEST_ID></HISTORY></HISTORY_RECORDS></VM>
  volatile.idmap.base: "0"
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
devices:
  context:
    path: /context
    source: /var/lib/one/datastores/0/0/mapper/disk.1
    type: disk
  root:
    path: /
    pool: default
    readonly: "false"
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
sergiojvg commented 5 years ago

URL starting with lxd:// prefix is for internal handling at downloader.sh, it is completely fine. Could you try to download an app to a filesystem datastore? Please, share your results.

tkald commented 5 years ago

Could you try to download an app to a filesystem datastore? Please, share your results.

I dont have any filesystem datastores at the moment..

tkald commented 5 years ago

Tried to map ceph image manually:

rbd-nbd --id libvirt map one/one-602
rbd-nbd: unknown command: --id

Seems like rbd-nbd is broken from ceph jewel repro, also similar issue here: https://github.com/kubernetes/kubernetes/issues/73020 I'll try to replace it and report back.

Also I don't have user.xml key under container config:

lxc config show one-3466
architecture: x86_64
config:
  limits.cpu.allowance: 100%
  limits.memory: 768MB
  security.nesting: "no"
  security.privileged: "false"
  volatile.apply_template: create
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
devices:
  context:
    path: /context
    source: /var/lib/one/datastores/100/3466/mapper/disk.1
    type: disk
  eth0:
    host_name: one-3466-0
    hwaddr: 02:00:c0:a8:29:15
    name: eth0
    nictype: bridged
    parent: vmbr0
    type: nic
  root:
    path: /
    pool: default
    readonly: "false"
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
tkald commented 5 years ago

Upgraded rbd-nbd on lxd node to Luminous branch and got bit further with my custom lxd image:

Thu Feb 28 12:37:23 2019 [Z0][VM][I]: New state is ACTIVE
Thu Feb 28 12:37:23 2019 [Z0][VM][I]: New LCM state is PROLOG
Thu Feb 28 12:37:26 2019 [Z0][VM][I]: New LCM state is BOOT
Thu Feb 28 12:37:26 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/3469/deployment.0
Thu Feb 28 12:37:27 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Thu Feb 28 12:37:28 2019 [Z0][VMM][I]: ExitCode: 0
Thu Feb 28 12:37:28 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/100/3469/deployment.0' 'vml-vm1-lxd' 3469 vml-vm1-lxd
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/lxd/storage-pools/default/containers/one-3469/rootfs using device /dev/nbd3
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Mounting /dev/nbd3 at /var/lib/lxd/storage-pools/default/containers/one-3469/rootfs
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/one/datastores/100/3469/mapper/disk.1 using device /dev/loop1
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Mounting /dev/loop1 at /var/lib/one/datastores/100/3469/mapper/disk.1
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Unmapping disk at /var/lib/lxd/storage-pools/default/containers/one-3469/rootfs
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Umounting disk mapped at /dev/nbd3
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: deploy: Unmapping disk at /var/lib/one/datastores/100/3469/mapper/disk.1
Thu Feb 28 12:37:33 2019 [Z0][VMM][E]: deploy: Failed to detect block device from /var/lib/one/datastores/100/3469/mapper/disk.1
Thu Feb 28 12:37:33 2019 [Z0][VMM][E]: deploy: failed to dismantle container storage
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:145:in `initialize': undefined method `[]' for #<LXDError:0x00000001b0adc8> (NoMethodError)
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `exception'
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `raise'
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `rescue in <main>'
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:74:in `<main>'
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: ExitCode: 1
Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Thu Feb 28 12:37:33 2019 [Z0][VMM][E]: Error deploying virtual machine
Thu Feb 28 12:37:33 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

With new version of rbd-nbd I can map ceph image manually:

rbd-nbd --id libvirt map one/one-602
/dev/nbd0
rbd-nbd list-mapped
pid  pool image   snap device
6501 one  one-602 -    /dev/nbd0
dann1 commented 5 years ago

Also I don't have user.xml key under container config:

It is only written if everything went smooth

Your container failed to start, when that fails, the storage maps are reverted(the process that is done when shutting down), but in this case, that also failed, when dealing with the contextualization disk.

Check the log in /var/log/lxd/one-2/lxc.log to learn more about the start errors

tkald commented 5 years ago

Seems like kernel / apparmor issue, running 4.4.176 kernel on Ubuntu Xenial.

cat lxc.log
lxc one-3472 20190301082225.647 WARN     conf - conf.c:lxc_setup_devpts:1616 - Invalid argument - Failed to unmount old devpts instance
lxc one-3472 20190301082225.647 WARN     apparmor - lsm/apparmor.c:apparmor_process_label_set:221 - Incomplete AppArmor support in your kernel
lxc one-3472 20190301082225.647 ERROR    apparmor - lsm/apparmor.c:apparmor_process_label_set:223 - If you really want to start this container, set
lxc one-3472 20190301082225.647 ERROR    apparmor - lsm/apparmor.c:apparmor_process_label_set:224 - lxc.apparmor.allow_incomplete = 1
lxc one-3472 20190301082225.647 ERROR    apparmor - lsm/apparmor.c:apparmor_process_label_set:225 - in your container configuration file
lxc one-3472 20190301082225.647 ERROR    sync - sync.c:__sync_wait:62 - An error occurred in another process (expected sequence number 5)
lxc one-3472 20190301082225.647 WARN     network - network.c:lxc_delete_network_priv:2589 - Operation not permitted - Failed to remove interface "eth0" with index 12
lxc one-3472 20190301082225.647 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state "ABORTING" instead of "RUNNING"
lxc one-3472 20190301082225.648 ERROR    start - start.c:__lxc_start:1939 - Failed to spawn container "one-3472"
lxc 20190301082225.657 WARN     commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command "get_state"

I'll upgrade kernel ...

dann1 commented 5 years ago

lxc one-3472 20190301082225.647 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state "ABORTING" instead of "RUNNING"

This is weird, anyway, I'll try to improve the log a bit, what do you think @rsmontero ?

Maybe even output the lxc.log file when things fail, and also fix

Thu Feb 28 12:37:33 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:145:in initialize': undefined method[]' for # (NoMethodError)

dann1 commented 5 years ago

Have you tried again importing images from the marketplace to ceph?

tkald commented 5 years ago

Have you tried again importing images from the marketplace to ceph?

No change with import

tkald commented 5 years ago

Seems like I also had ssh communication failure between storage bridge nodes and lxd host. Now lxd containers remain in "boot" state:

Fri Mar 1 18:00:56 2019 [Z0][VM][I]: New state is ACTIVE
Fri Mar 1 18:00:56 2019 [Z0][VM][I]: New LCM state is PROLOG
Fri Mar 1 18:00:58 2019 [Z0][VM][I]: New LCM state is BOOT
Fri Mar 1 18:00:58 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/3489/deployment.0
Fri Mar 1 18:00:59 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Mar 1 18:01:00 2019 [Z0][VMM][I]: ExitCode: 0
Fri Mar 1 18:01:00 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.

Zero byte lxd.log in lxd node. Can't find any error in logs... I'll keep digging.

dann1 commented 5 years ago

When stuck in boot, during deploy phase, it might be the deploy script stuck somewhere, can you check if the container was at least created with the config ?

tkald commented 5 years ago

No container config on lxd node, just zero byte log.

ls -al /var/log/lxd/one-3495
total 8
drwxr-xr-x  2 root root 4096 Mar  1 19:33 .
drwxr-xr-x 11 root root 4096 Mar  1 19:33 ..
-rw-r-----  1 root root    0 Mar  1 19:33 lxc.log

On LXD node, deploy script is still in process list:

ps wax | grep 3495
25187 ?        Sl     0:00 /usr/bin/ruby /var/tmp/one/vmm/lxd/deploy /var/lib/one//datastores/100/3495/deployment.0 vml-vm2-lxd 3495 vml-vm2-lxd
dann1 commented 5 years ago

Try lxc config show one-3495, then you can kill the process and see where it got stuck on the vm log

tkald commented 5 years ago

There's no /var/lib/one//datastores/100/3495/deployment.0 file, that's probably deploy gets stuck.

lxc config show one-3495
architecture: x86_64
config:
  limits.cpu.allowance: 100%
  limits.memory: 768MB
  security.nesting: "no"
  security.privileged: "YES"
  volatile.apply_template: create
  volatile.idmap.base: "0"
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
devices:
  context:
    path: /context
    source: /var/lib/one/datastores/100/3495/mapper/disk.1
    type: disk
  eth0:
    host_name: one-3495-0
    hwaddr: 02:00:c0:a8:29:15
    name: eth0
    nictype: bridged
    parent: vmbr0
    type: nic
  root:
    path: /
    pool: default
    readonly: "false"
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

VM log (of next lxd container I booted and killed stuck deploy script:

Fri Mar 1 19:47:50 2019 [Z0][VM][I]: New state is ACTIVE
Fri Mar 1 19:47:50 2019 [Z0][VM][I]: New LCM state is PROLOG
Fri Mar 1 19:47:53 2019 [Z0][VM][I]: New LCM state is BOOT
Fri Mar 1 19:47:53 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/3497/deployment.0
Fri Mar 1 19:47:54 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Mar 1 19:47:55 2019 [Z0][VMM][I]: ExitCode: 0
Fri Mar 1 19:47:55 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/100/3497/deployment.0' 'vml-vm2-lxd' 3497 vml-vm2-lxd
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/command.rb:60:in `unlock': undefined method `close' for nil:NilClass (NoMethodError)
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: Did you mean? clone
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/command.rb:37:in `ensure in execute'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/command.rb:37:in `execute'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/rbd.rb:36:in `do_map'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/mapper.rb:115:in `map'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:359:in `public_send'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:359:in `setup_disk'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:241:in `block in setup_storage'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:234:in `each'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:234:in `setup_storage'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:71:in `<main>'
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: ExitCode: 1
Fri Mar 1 19:49:40 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri Mar 1 19:49:40 2019 [Z0][VMM][E]: Error deploying virtual machine
Fri Mar 1 19:49:40 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE
dann1 commented 5 years ago

There's no /var/lib/one//datastores/100/3495/deployment.0 file, that's probably deploy gets stuck.

This is the normal behaviour for LXD drivers, the deployment xml is saved in the container config.

Can you check if this file exists ? /tmp/onelxd-lock And if it does, remove it and try to deploy again.

tkald commented 5 years ago

Yes, there was a lock file - removed it.

ps wax | grep 3498
22928 ?        Sl     0:00 /usr/bin/ruby /var/tmp/one/vmm/lxd/deploy /var/lib/one//datastores/100/3498/deployment.0 vml-vm2-lxd 3498 vml-vm2-lxd
22946 ?        Sl     0:00 rbd-nbd --id libvirt map one/one-602-3498-0
22993 ?        S      0:00 sudo mount /dev/nbd0 /var/lib/lxd/storage-pools/default/containers/one-3498/rootfs
22997 ?        D      0:00 mount /dev/nbd0 /var/lib/lxd/storage-pools/default/containers/one-3498/rootfs

And disk seems to be mapped at least:

rbd-nbd list-mapped
pid   pool image          snap device
22946 one  one-602-3498-0 -    /dev/nbd0

But nbd0 device isn't mounted to folder.

Bit more info when killed deploy script:

Fri Mar 1 20:08:05 2019 [Z0][VM][I]: New state is ACTIVE
Fri Mar 1 20:08:05 2019 [Z0][VM][I]: New LCM state is PROLOG
Fri Mar 1 20:08:07 2019 [Z0][VM][I]: New LCM state is BOOT
Fri Mar 1 20:08:07 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/3498/deployment.0
Fri Mar 1 20:08:08 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Mar 1 20:08:09 2019 [Z0][VMM][I]: ExitCode: 0
Fri Mar 1 20:08:09 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/100/3498/deployment.0' 'vml-vm2-lxd' 3498 vml-vm2-lxd
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/lxd/storage-pools/default/containers/one-3498/rootfs using device /dev/nbd0
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: deploy: Mounting /dev/nbd0 at /var/lib/lxd/storage-pools/default/containers/one-3498/rootfs
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: bash: line 6: 22927 Done cat <<EOT
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: <VM><ID>3498</ID><UID>0</UID><GID>0</GID><UNAME>oneadmin</UNAME><GNAME>oneadmin</GNAME><NAME>ubuntu_xenial_LXD-3498</NAME><PERMISSIONS><OWNER_U>1</OWNER_U><OWNER_M>1</OWNER_M><OWNER_A>0</OWNER_A><GROUP_U>0</GROUP_U><GROUP_M>0</GROUP_M><GROUP_A>0</GROUP_A><OTHER_U>0</OTHER_U><OTHER_M>0</OTHER_M><OTHER_A>0</OTHER_A></PERMISSIONS><LAST_POLL>0</LAST_POLL><STATE>3</STATE><LCM_STATE>2</LCM_STATE><PREV_STATE>3</PREV_STATE><PREV_LCM_STATE>2</PREV_LCM_STATE><RESCHED>0</RESCHED><STIME>1551463683</STIME><ETIME>0</ETIME><DEPLOY_ID></DEPLOY_ID><MONITORING></MONITORING><TEMPLATE><AUTOMATIC_DS_REQUIREMENTS><![CDATA[("CLUSTERS/ID" @> 0)]]></AUTOMATIC_DS_REQUIREMENTS><AUTOMATIC_NIC_REQUIREMENTS><![CDATA[("CLUSTERS/ID" @> 0)]]></AUTOMATIC_NIC_REQUIREMENTS><AUTOMATIC_REQUIREMENTS><![CDATA[(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES)]]></AUTOMATIC_REQUIREMENTS><CONTEXT><DISK_ID><![CDATA[1]]></DISK_ID><ETH0_CONTEXT_FORCE_IPV4><![CDATA[]]></ETH0_CONTEXT_FORCE_IPV4><ETH0_DNS><![CDATA[192.168.40.175 192.168.40.176]]></ETH0_DNS><ETH0_EXTERNAL><![CDATA[]]></ETH0_EXTERNAL><ETH0_GATEWAY><![CDATA[192.168.40.1]]></ETH0_GATEWAY><ETH0_GATEWAY6><![CDATA[]]></ETH0_GATEWAY6><ETH0_IP><![CDATA[192.168.41.21]]></ETH0_IP><ETH0_IP6><![CDATA[]]></ETH0_IP6><ETH0_IP6_PREFIX_LENGTH><![CDATA[]]></ETH0_IP6_PREFIX_LENGTH><ETH0_IP6_ULA><![CDATA[]]></ETH0_IP6_ULA><ETH0_MAC><![CDATA[02:00:c0:a8:29:15]]></ETH0_MAC><ETH0_MASK><![CDATA[255.255.254.0]]></ETH0_MASK><ETH0_MTU><![CDATA[]]></ETH0_MTU><ETH0_NETWORK><![CDATA[]]></ETH0_NETWORK><ETH0_SEARCH_DOMAIN><![CDATA[]]></ETH0_SEARCH_DOMAIN><ETH0_VLAN_ID><![CDATA[]]></ETH0_VLAN_ID><ETH0_VROUTER_IP><![CDATA[]]></ETH0_VROUTER_IP><ETH0_VROUTER_IP6><![CDATA[]]></ETH0_VROUTER_IP6><ETH0_VROUTER_MANAGEMENT><![CDATA[]]></ETH0_VROUTER_MANAGEMENT><NETWORK><![CDATA[YES]]></NETWORK><SET_HOSTNAME><![CDATA[ubuntu_xenial_LXD-3498]]></SET_HOSTNAME><SSH_PUBLIC_KEY><![CDATA[ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCz+oeXVWGv6IW0oRt6Sxqz8OaPdvWy5R6cLTp52Egf3JY7xm6KrkOCkTupN3jTff/YsNX44N6ByuamJIksQyVfvq9CmHJbG0xtwmr1xwDf69AF/nkxJqxAQkW+I+grOa3PdryegCCriRlGBWsRmiG8a9OoNrmoD8wcVLcC7I4xaV4NvOi/xaLQq2Br+62Fct4g4bXjKP/hx2B5l5+LzaShq7BZcpWMUuvoY7lvIDKpNfzGHCn/YW8yB0/N5o3pYplnrckBeD2cQQM31GJMItm7eq56ExP0rzL1atFC74mZwZXArYeBxatY1mU7nltee3i7Qzh/DERkJFZQ39awXa2QT+PSP8Ee+rOnsB9XQFBvpeV1USQAWdxxgm/UZzr43tByU3yMaPgQkGLIjUcZoL80RSSuKvtVkrXfkbuz5AQjsIxsMiOpnaiWAtGdjTAiEYIrNnxgB3VrXjFqZZm6MWFrA9haCYGnrbtiUgLArOqV1FJb+r3/+ifM7DakMFQhL1IHwjiY8xJ40Mz41cW0qV4aBJSFwFzniEbiq9rc7DwLIveR/hRnn+XphucOh2N0AvssegsxQDvcKjGB3HjYOb75geWrx3uwRWyfpC0aCswQkIxcjy8wfcsVzpsyqvUbh/qig0DeGoOMO+YxvVYCXuS4iu6SuQI8McZmjiFP+BhT5Q== taavi@laevakompanii.ee]]></SSH_PUBLIC_KEY><TARGET><![CDATA[hda]]></TARGET></CONTEXT><CPU><![CDATA[1]]></CPU><DISK><ALLOW_ORPHANS><![CDATA[YES]]></ALLOW_ORPHANS><CEPH_HOST><![CDATA[192.168.30.71:6789 192.168.30.72:6789 192.168.30.73:6789]]></CEPH_HOST><CEPH_SECRET><![CDATA[7695da24-d486-44ab-85e6-14255f26f394]]></CEPH_SECRET><CEPH_USER><![CDATA[libvirt]]></CEPH_USER><CLONE><![CDATA[YES]]></CLONE><CLONE_TARGET><![CDATA[SELF]]></CLONE_TARGET><CLUSTER_ID><![CDATA[0]]></CLUSTER_ID><DATASTORE><![CDATA[cephds]]></DATASTORE><DATASTORE_ID><![CDATA[101]]></DATASTORE_ID><DEV_PREFIX><![CDATA[vd]]></DEV_PREFIX><DISK_ID><![CDATA[0]]></DISK_ID><DISK_SNAPSHOT_TOTAL_SIZE><![CDATA[0]]></DISK_SNAPSHOT_TOTAL_SIZE><DISK_TYPE><![CDATA[RBD]]></DISK_TYPE><DRIVER><![CDATA[raw]]></DRIVER><IMAGE><![CDATA[xenial-lxd-img-slk-20190228-015109]]></IMAGE><IMAGE_ID><![CDATA[602]]></IMAGE_ID><IMAGE_STATE><![CDATA[2]]></IMAGE_STATE><IMAGE_UNAME><![CDATA[oneadmin]]></IMAGE_UNAME><LN_TARGET><![CDATA[NONE]]></LN_TARGET><ORIGINAL_SIZE><![CDATA[10240]]></ORIGINAL_SIZE><POOL_NAME><![CDATA[one]]></POOL_NAME><READONLY><![CDATA[NO]]></READONLY><SAVE><![CDATA[NO]]></SAVE><SIZE><![CDATA[10240]]></SIZE><SOURCE><![CDATA[one/one-602]]></SOURCE><TARGET><![CDATA[vda]]></TARGET><TM_MAD><![CDATA[ceph]]></TM_MAD><TYPE><![CDATA[RBD]]></TYPE></DISK><GRAPHICS><LISTEN><![CDATA[0.0.0.0]]></LISTEN><PORT><![CDATA[19398]]></PORT><TYPE><![CDATA[VNC]]></TYPE></GRAPHICS><MEMORY><![CDATA[768]]></MEMORY><NIC><AR_ID><![CDATA[0]]></AR_ID><BRIDGE><![CDATA[vmbr0]]></BRIDGE><BRIDGE_TYPE><![CDATA[linux]]></BRIDGE_TYPE><CLUSTER_ID><![CDATA[0]]></CLUSTER_ID><FILTER_IP_SPOOFING><![CDATA[YES]]></FILTER_IP_SPOOFING><FILTER_MAC_SPOOFING><![CDATA[YES]]></FILTER_MAC_SPOOFING><IP><![CDATA[192.168.41.21]]></IP><MAC><![CDATA[02:00:c0:a8:29:15]]></MAC><NAME><![CDATA[NIC0]]></NAME><NETWORK><![CDATA[SLK vlan 40]]></NETWORK><NETWORK_ID><![CDATA[7]]></NETWORK_ID><NIC_ID><![CDATA[0]]></NIC_ID><SECURITY_GROUPS><![CDATA[0,102]]></SECURITY_GROUPS><TARGET><![CDATA[one-3498-0]]></TARGET><VN_MAD><![CDATA[fw]]></VN_MAD></NIC><OS><BOOT><![CDATA[]]></BOOT></OS><SECURITY_GROUP_RULE><PROTOCOL><![CDATA[ALL]]></PROTOCOL><RULE_TYPE><![CDATA[OUTBOUND]]></RULE_TYPE><SECURITY_GROUP_ID><![CDATA[102]]></SECURITY_GROUP_ID><SECURITY_GROUP_NAME><![CDATA[ANY to ANY]]></SECURITY_GROUP_NAME></SECURITY_GROUP_RULE><SECURITY_GROUP_RULE><PROTOCOL><![CDATA[ALL]]></PROTOCOL><RULE_TYPE><![CDATA[INBOUND]]></RULE_TYPE><SECURITY_GROUP_ID><![CDATA[102]]></SECURITY_GROUP_ID><SECURITY_GROUP_NAME><![CDATA[ANY to ANY]]></SECURITY_GROUP_NAME></SECURITY_GROUP_RULE><TEMPLATE_ID><![CDATA[163]]></TEMPLATE_ID><TM_MAD_SYSTEM><![CDATA[ceph]]></TM_MAD_SYSTEM><VMID><![CDATA[3498]]></VMID></TEMPLATE><USER_TEMPLATE><HYPERVISOR><![CDATA[lxd]]></HYPERVISOR><INPUTS_ORDER><![CDATA[]]></INPUTS_ORDER><LXD_PROFILE><![CDATA[default]]></LXD_PROFILE><LXD_SECURITY_NESTING><![CDATA[no]]></LXD_SECURITY_NESTING><LXD_SECURITY_PRIVILEGED><![CDATA[YES]]></LXD_SECURITY_PRIVILEGED><MEMORY_UNIT_COST><![CDATA[MB]]></MEMORY_UNIT_COST><SCHED_REQUIREMENTS><![CDATA[ID="9"]]></SCHED_REQUIREMENTS></USER_TEMPLATE><HISTORY_RECORDS><HISTORY><OID>3498</OID><SEQ>0</SEQ><HOSTNAME>vml-vm2-lxd</HOSTNAME><HID>9</HID><CID>0</CID><STIME>1551463685</STIME><ETIME>0</ETIME><VM_MAD><![CDATA[lxd]]></VM_MAD><TM_MAD><![CDATA[ceph]]></TM_MAD><DS_ID>100</DS_ID><PSTIME>1551463685</PSTIME><PETIME>1551463687</PETIME><RSTIME>1551463687</RSTIME><RETIME>0</RETIME><ESTIME>0</ESTIME><EETIME>0</EETIME><ACTION>0</ACTION><UID>-1</UID><GID>-1</GID><REQUEST_ID>-1</REQUEST_ID></HISTORY></HISTORY_RECORDS></VM>
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: EOT
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: 
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: 22928 Terminated | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/100/3498/deployment.0' 'vml-vm2-lxd' 3498 vml-vm2-lxd
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: ExitCode: 143
Fri Mar 1 20:27:25 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri Mar 1 20:27:25 2019 [Z0][VMM][E]: Error deploying virtual machine
Fri Mar 1 20:27:25 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

There were multiple mount request hang on the same nbd0 device from previous failures. Rebooted lxd node and got bit further:

Fri Mar 1 20:43:28 2019 [Z0][VM][I]: New state is ACTIVE
Fri Mar 1 20:43:28 2019 [Z0][VM][I]: New LCM state is PROLOG
Fri Mar 1 20:43:30 2019 [Z0][VM][I]: New LCM state is BOOT
Fri Mar 1 20:43:30 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/3499/deployment.0
Fri Mar 1 20:43:31 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Mar 1 20:43:32 2019 [Z0][VMM][I]: ExitCode: 0
Fri Mar 1 20:43:32 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/100/3499/deployment.0' 'vml-vm2-lxd' 3499 vml-vm2-lxd
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/lxd/storage-pools/default/containers/one-3499/rootfs using device /dev/nbd0
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Mounting /dev/nbd0 at /var/lib/lxd/storage-pools/default/containers/one-3499/rootfs
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/one/datastores/100/3499/mapper/disk.1 using device /dev/loop0
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Mounting /dev/loop0 at /var/lib/one/datastores/100/3499/mapper/disk.1
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Unmapping disk at /var/lib/lxd/storage-pools/default/containers/one-3499/rootfs
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Umounting disk mapped at /dev/nbd0
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: deploy: Unmapping disk at /var/lib/one/datastores/100/3499/mapper/disk.1
Fri Mar 1 20:43:35 2019 [Z0][VMM][E]: deploy: Failed to detect block device from /var/lib/one/datastores/100/3499/mapper/disk.1
Fri Mar 1 20:43:35 2019 [Z0][VMM][E]: deploy: failed to dismantle container storage
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:145:in `initialize': undefined method `[]' for #<LXDError:0x0000000277ee60> (NoMethodError)
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `exception'
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `raise'
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `rescue in <main>'
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:74:in `<main>'
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: ExitCode: 1
Fri Mar 1 20:43:35 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri Mar 1 20:43:35 2019 [Z0][VMM][E]: Error deploying virtual machine
Fri Mar 1 20:43:35 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE

In LXD node syslog, there's nbd module crash:

Mar  1 20:43:35 vml-vm2 libvirtd[4581]: Failed to open file '/sys/class/net/vethO9Q1XY/operstate': No such file or directory
Mar  1 20:43:35 vml-vm2 libvirtd[4581]: unable to read: /sys/class/net/vethO9Q1XY/operstate: No such file or directory
Mar  1 20:43:35 vml-vm2 systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 7920 (lxd)
Mar  1 20:43:35 vml-vm2 systemd[1]: Mounting Arbitrary Executable File Formats File System...
Mar  1 20:43:35 vml-vm2 systemd[1]: Mounted Arbitrary Executable File Formats File System.
Mar  1 20:43:35 vml-vm2 lxd[5282]: t=2019-03-01T20:43:35+0200 lvl=eror msg="Failed starting container" action=start created=2019-03-01T20:43:32+0200 ephemeral=false name=one-3499 stateful=false used=1970-01-01T03:00:00+0300
Mar  1 20:43:35 vml-vm2 systemd[1]: Started Session 14 of user oneadmin.
Mar  1 20:43:35 vml-vm2 kernel: [  196.574941] block nbd0: NBD_DISCONNECT
Mar  1 20:43:35 vml-vm2 kernel: [  196.679424] block nbd0: Receive control failed (result -32)
Mar  1 20:43:35 vml-vm2 kernel: [  196.680717] block nbd0: shutting down socket
Mar  1 20:43:36 vml-vm2 kernel: [  197.027686] block nbd0: Attempted send on closed socket
Mar  1 20:43:36 vml-vm2 kernel: [  197.028868] blk_update_request: I/O error, dev nbd0, sector 0
Mar  1 20:43:36 vml-vm2 kernel: [  197.030381] ------------[ cut here ]------------
Mar  1 20:43:37 vml-vm2 kernel: [  197.031416] kernel BUG at /home/kernel/COD/linux/fs/buffer.c:3005!
Mar  1 20:43:37 vml-vm2 kernel: [  197.032791] invalid opcode: 0000 [#1] SMP

I'll try to find suitable kernel to that node...

tkald commented 5 years ago

No luck booting container - seems to me there's issue with network interfaces. Syslog from LXD host while container tries to boot:

Mar  1 23:03:34 vml-vm2 lxd[5780]: t=2019-03-01T23:03:34+0200 lvl=warn msg="Unable to update backup.yaml at this time" name=one-3510
Mar  1 23:03:36 vml-vm2 kernel: [ 7255.808575] EXT4-fs (nbd4): mounted filesystem with ordered data mode. Opts: (null)
Mar  1 23:03:36 vml-vm2 systemd[1]: dev-disk-by\x2dlabel-CONTEXT.device: Dev dev-disk-by\x2dlabel-CONTEXT.device appeared twice with different sysfs paths /sys/devices/virtual/block/loop1 and /sys/devices/virtual/block/loop4
Mar  1 23:03:36 vml-vm2 systemd[5425]: dev-disk-by\x2dlabel-CONTEXT.device: Dev dev-disk-by\x2dlabel-CONTEXT.device appeared twice with different sysfs paths /sys/devices/virtual/block/loop0 and /sys/devices/virtual/block/loop4
Mar  1 23:03:36 vml-vm2 systemd[8474]: dev-disk-by\x2dlabel-CONTEXT.device: Dev dev-disk-by\x2dlabel-CONTEXT.device appeared twice with different sysfs paths /sys/devices/virtual/block/loop0 and /sys/devices/virtual/block/loop4
Mar  1 23:03:36 vml-vm2 systemd[5425]: dev-disk-by\x2dlabel-CONTEXT.device: Dev dev-disk-by\x2dlabel-CONTEXT.device appeared twice with different sysfs paths /sys/devices/virtual/block/loop0 and /sys/devices/virtual/block/loop4
Mar  1 23:03:36 vml-vm2 systemd[1]: dev-disk-by\x2dlabel-CONTEXT.device: Dev dev-disk-by\x2dlabel-CONTEXT.device appeared twice with different sysfs paths /sys/devices/virtual/block/loop1 and /sys/devices/virtual/block/loop4
Mar  1 23:03:36 vml-vm2 systemd[8474]: dev-disk-by\x2dlabel-CONTEXT.device: Dev dev-disk-by\x2dlabel-CONTEXT.device appeared twice with different sysfs paths /sys/devices/virtual/block/loop0 and /sys/devices/virtual/block/loop4
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.013100] ISO 9660 Extensions: Microsoft Joliet Level 3
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.053116] ISO 9660 Extensions: Microsoft Joliet Level 3
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.054636] ISO 9660 Extensions: RRIP_1991A
Mar  1 23:03:36 vml-vm2 systemd-udevd[9989]: Could not generate persistent MAC address for vethCFFXJA: No such file or directory
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.226738] audit: type=1400 audit(1551474216.558:22): apparmor="STATUS" operation="profile_load" name="lxd-one-3510_</var/lib/lxd>" pid=9988 comm="apparmor_parser"
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.236729] vmbr0: port 2(one-3510-0) entered blocking state
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.236732] vmbr0: port 2(one-3510-0) entered disabled state
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.236875] device one-3510-0 entered promiscuous mode
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.237011] IPv6: ADDRCONF(NETDEV_UP): one-3510-0: link is not ready
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.298427] eth0: renamed from vethCFFXJA
Mar  1 23:03:36 vml-vm2 libvirtd[5012]: ethtool ioctl error: No such device
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.336960] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.338381] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.339645] IPv6: ADDRCONF(NETDEV_CHANGE): one-3510-0: link becomes ready
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.340992] vmbr0: port 2(one-3510-0) entered blocking state
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.342279] vmbr0: port 2(one-3510-0) entered forwarding state
Mar  1 23:03:36 vml-vm2 libvirtd[5012]: message repeated 4 times: [ ethtool ioctl error: No such device]
Mar  1 23:03:36 vml-vm2 lxd[5780]: t=2019-03-01T23:03:36+0200 lvl=eror msg="Failed starting container" action=start created=2019-03-01T23:03:34+0200 ephemeral=false name=one-3510 stateful=false used=1970-01-01T03:00:00+0300
Mar  1 23:03:36 vml-vm2 libvirtd[5012]: ethtool ioctl error: No such device
Mar  1 23:03:36 vml-vm2 libvirtd[5012]: ethtool ioctl error: No such device
Mar  1 23:03:36 vml-vm2 libvirtd[5012]: ethtool ioctl error: No such device
Mar  1 23:03:36 vml-vm2 kernel: [ 7256.605447] block nbd4: NBD_DISCONNECT
Mar  1 23:03:37 vml-vm2 kernel: [ 7256.717047] block nbd4: Receive control failed (result -32)
Mar  1 23:03:37 vml-vm2 kernel: [ 7256.718268] block nbd4: shutting down socket
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.037248] block nbd4: Attempted send on closed socket
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.038444] blk_update_request: I/O error, dev nbd4, sector 0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.039831] ------------[ cut here ]------------
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.041002] kernel BUG at /home/kernel/COD/linux/fs/buffer.c:3060!
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.042114] invalid opcode: 0000 [#5] SMP
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.043204] Modules linked in: binfmt_misc veth nls_utf8 isofs ipmi_si ebtable_filter ebtables ip6table_filter ip6_tables mpt3sas raid_class scsi_transport_sas mptctl mptbase iptable_filter ip_tables x_tables dell_rbu bridge stp llc bonding ipmi_ssif ipmi_devintf gpio_ich dcdbas intel_powerclamp coretemp intel_cstate kvm_intel serio_raw i7core_edac kvm shpchp ipmi_msghandler edac_core ioatdma irqbypass lpc_ich acpi_power_meter mac_hid nbd ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mgag200 ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel igb syscopyarea aesni_intel sysfillrect dca sysimgblt
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.050829]  aes_x86_64 fb_sys_fops lrw ptp glue_helper uas ablk_helper pps_core pata_acpi cryptd psmouse drm usb_storage megaraid_sas i2c_algo_bit bnx2 wmi fjes [last unloaded: ipmi_si]
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.053597] CPU: 4 PID: 9977 Comm: lxd Tainted: G      D   I     4.9.161-0409161-generic #201902271031
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.055007] Hardware name: Dell Inc. PowerEdge R610/0F0XJ6, BIOS 6.4.0 07/23/2013
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.056427] task: ffff96a5d5a9c080 task.stack: ffffb74e20c1c000
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.057850] RIP: 0010:[<ffffffff9b27fe31>]  [<ffffffff9b27fe31>] submit_bh_wbc+0x191/0x1f0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.059277] RSP: 0018:ffffb74e20c1fcc0  EFLAGS: 00010246
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.060669] RAX: 0000000004000005 RBX: ffff96a9a905a340 RCX: 0000000000000000
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.062061] RDX: ffff96a9a905a340 RSI: 0000000000000348 RDI: 0000000000000001
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.063428] RBP: ffffb74e20c1fcf0 R08: 0000000000000000 R09: 0000000000000020
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.064754] R10: 0000000000000000 R11: 0000000000000000 R12: ffff96a5d5bfd800
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.066084] R13: ffff96a5d5bfd824 R14: 0000000000000348 R15: ffff96a9b2020000
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.067415] FS:  00007efc39ad5840(0000) GS:ffff96a9efc80000(0000) knlGS:0000000000000000
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.068761] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.070101] CR2: 00007fec7e65beb8 CR3: 00000004154e6000 CR4: 0000000000020670
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.071449] Stack:
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.072781]  ffffffff9b8b2524 ffff96a9a905a340 ffff96a5d5bfd800 ffff96a5d5bfd824
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.074154]  0000000000000348 ffff96a9b2020000 ffffb74e20c1fd00 ffffffff9b280543
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.075519]  ffffb74e20c1fd58 ffffffff9b334d50 ffff96a5db699800 0000000000000000
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.076875] Call Trace:
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.078229]  [<ffffffff9b8b2524>] ? __switch_to_asm+0x34/0x70
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.079610]  [<ffffffff9b280543>] submit_bh+0x13/0x20
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.080955]  [<ffffffff9b334d50>] jbd2_write_superblock+0x100/0x260
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.082275]  [<ffffffff9b334f78>] jbd2_mark_journal_empty+0x78/0xa0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.083584]  [<ffffffff9b335205>] jbd2_journal_destroy+0x265/0x2a0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.084905]  [<ffffffff9b0ce870>] ? wake_bit_function+0x60/0x60
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.086221]  [<ffffffff9b2fcf02>] ext4_put_super+0x72/0x3a0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.087529]  [<ffffffff9b2473a2>] generic_shutdown_super+0x72/0x100
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.088830]  [<ffffffff9b247727>] kill_block_super+0x27/0x70
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.090123]  [<ffffffff9b247868>] deactivate_locked_super+0x48/0x80
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.091421]  [<ffffffff9b247d86>] deactivate_super+0x46/0x60
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.091456] audit: type=1400 audit(1551474217.422:23): apparmor="STATUS" operation="profile_remove" name="lxd-one-3510_</var/lib/lxd>" pid=10255 comm="apparmor_parser"
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.095388]  [<ffffffff9b26791f>] cleanup_mnt+0x3f/0x90
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.096739]  [<ffffffff9b2679b2>] __cleanup_mnt+0x12/0x20
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.098090]  [<ffffffff9b0a7e0c>] task_work_run+0x7c/0xa0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.099438]  [<ffffffff9b003765>] exit_to_usermode_loop+0xa5/0xb0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.100788]  [<ffffffff9b003bd7>] do_syscall_64+0xc7/0xe0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.102129]  [<ffffffff9b8b240e>] entry_SYSCALL_64_after_swapgs+0x58/0xc6
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.103474] Code: fd 4c 89 e7 45 89 6c 24 14 e8 cc cf 17 00 48 83 c4 08 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 f0 80 62 01 f7 e9 e1 fe ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 80 3d 2a 9d cc 00 00 75 a5 be 1b
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.106373] RIP  [<ffffffff9b27fe31>] submit_bh_wbc+0x191/0x1f0
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.107754]  RSP <ffffb74e20c1fcc0>
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.113877] ---[ end trace 36f1a7dd0110b6fb ]---
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.180804] vmbr0: port 2(one-3510-0) entered disabled state
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.182510] device one-3510-0 left promiscuous mode
Mar  1 23:03:37 vml-vm2 kernel: [ 7257.183834] vmbr0: port 2(one-3510-0) entered disabled state
lxc config show one-3510
architecture: x86_64
config:
  limits.cpu.allowance: 100%
  limits.memory: 768MB
  security.nesting: "no"
  security.privileged: "false"
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: STOPPED
devices:
  context:
    path: /context
    source: /var/lib/one/datastores/100/3510/mapper/disk.1
    type: disk
  eth0:
    host_name: one-3510-0
    hwaddr: 02:00:c0:a8:29:17
    name: eth0
    nictype: bridged
    parent: vmbr0
    type: nic
  root:
    path: /
    pool: default
    readonly: "false"
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

and container's log on lxd host:

/var/log/lxd/one-3510# cat lxc.log
lxc one-3510 20190301210336.798 WARN     conf - conf.c:lxc_setup_devpts:1616 - Invalid argument - Failed to unmount old devpts instance
lxc one-3510 20190301210336.798 WARN     apparmor - lsm/apparmor.c:apparmor_process_label_set:221 - Incomplete AppArmor support in your kernel
lxc one-3510 20190301210336.798 ERROR    apparmor - lsm/apparmor.c:apparmor_process_label_set:223 - If you really want to start this container, set
lxc one-3510 20190301210336.798 ERROR    apparmor - lsm/apparmor.c:apparmor_process_label_set:224 - lxc.apparmor.allow_incomplete = 1
lxc one-3510 20190301210336.798 ERROR    apparmor - lsm/apparmor.c:apparmor_process_label_set:225 - in your container configuration file
lxc one-3510 20190301210336.798 ERROR    sync - sync.c:__sync_wait:62 - An error occurred in another process (expected sequence number 5)
lxc one-3510 20190301210336.798 WARN     network - network.c:lxc_delete_network_priv:2589 - Operation not permitted - Failed to remove interface "eth0" with index 20
lxc one-3510 20190301210336.798 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state "ABORTING" instead of "RUNNING"
lxc one-3510 20190301210336.799 ERROR    start - start.c:__lxc_start:1939 - Failed to spawn container "one-3510"
lxc 20190301210336.809 WARN     commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command "get_state"

And VML log on Sunstone:

Fri Mar 1 23:03:06 2019 [Z0][VM][I]: New state is ACTIVE
Fri Mar 1 23:03:06 2019 [Z0][VM][I]: New LCM state is PROLOG
Fri Mar 1 23:03:08 2019 [Z0][VM][I]: New LCM state is BOOT
Fri Mar 1 23:03:08 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/3510/deployment.0
Fri Mar 1 23:03:33 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Mar 1 23:03:34 2019 [Z0][VMM][I]: ExitCode: 0
Fri Mar 1 23:03:34 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/lxd/deploy '/var/lib/one//datastores/100/3510/deployment.0' 'vml-vm2-lxd' 3510 vml-vm2-lxd
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/lxd/storage-pools/default/containers/one-3510/rootfs using device /dev/nbd4
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Mounting /dev/nbd4 at /var/lib/lxd/storage-pools/default/containers/one-3510/rootfs
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/one/datastores/100/3510/mapper/disk.1 using device /dev/loop4
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Mounting /dev/loop4 at /var/lib/one/datastores/100/3510/mapper/disk.1
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Using rbd disk mapper for
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Unmapping disk at /var/lib/lxd/storage-pools/default/containers/one-3510/rootfs
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Umounting disk mapped at /dev/nbd4
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: deploy: Unmapping disk at /var/lib/one/datastores/100/3510/mapper/disk.1
Fri Mar 1 23:03:37 2019 [Z0][VMM][E]: deploy: Failed to detect block device from /var/lib/one/datastores/100/3510/mapper/disk.1
Fri Mar 1 23:03:37 2019 [Z0][VMM][E]: deploy: failed to dismantle container storage
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:145:in `initialize': undefined method `[]' for #<LXDError:0x00000000e5c310> (NoMethodError)
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `exception'
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `raise'
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:86:in `rescue in <main>'
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/deploy:74:in `<main>'
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: ExitCode: 1
Fri Mar 1 23:03:37 2019 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Fri Mar 1 23:03:37 2019 [Z0][VMM][E]: Error deploying virtual machine
Fri Mar 1 23:03:37 2019 [Z0][VM][I]: New LCM state is BOOT_FAILURE
dann1 commented 5 years ago

Try using kernel 4.15

tkald commented 5 years ago

And we have a boot with 4.15 kernel :)

dann1 commented 5 years ago

Glad it booted, so, can you tell me which kernels did you try?

Also check again for the marketplace import

tkald commented 5 years ago

Issues were with Ubuntu 4.4.176 and 4.9.161 kernel. And booted with 4.15.0-45-generic kernel. Still no luck wich marketplace import. Does import also use nbd device?

dann1 commented 5 years ago

It doesn't map into nbd devices

dann1 commented 5 years ago

I've added the kernel and luminous requirement to the doc, if the marketplace import issue persists, feel free to open another issue. Thanks for the feedback.