Open josephineSei opened 7 months ago
I have discussed and outlined the spec with @markus-hentsch and currently writing it.
On the user side we'd need to be able to create LUKS-encrypted files. Nova devs made us aware of the QEMU tooling being able to do so allegedly.
Here is the Nova implementation:
Here's a shortened example:
echo "muchsecretsuchwow" > secret_file.key
qemu-img convert -f raw -O luks --object secret,id=sec,file=secret_file.key -o key-secret=sec \
-o cipher-alg=aes-256 -o cipher-mode=xts -o hash-alg=sha256 \
-o ivgen-alg=plain64 -o ivgen-hash-alg=sha256 \
$INPUT_FILE $OUTPUT_LUKS_FILE
It works within a Docker container running Ubuntu LTS but interestingly it doesn't on my NixOS setup, failing with:
qemu-img: output.luks: error while converting luks: Unsupported cipher mode xts
Versions of qemu-img are 6.2.0 (Ubuntu) and 8.1.5 (NixOS) respectively. I wonder if there is another system dependency in play here or if it's the version difference. We should investigate because this impacts the ability to create images on the client side.
EDIT:
Tried it on ubuntu:noble
image which ships qemu-img 8.2.1 and it still works. So no deprecation problem here in regards to the qemu-img version. I think we can safely ignore it as a NixOS-specific issue then.
As I was adding restore instructions to the user data backup guide in https://github.com/SovereignCloudStack/docs/pull/176 I reproduced the process of creating volumes from previously encrypted (LUKS) images originating from Cinder itself on my DevStack and noticed that Cinder does not verify that the target volume actually has an encrypted type:
$ file image.raw
image.raw: LUKS encrypted file, ver 1 [aes, xts-plain64, sha256]
$ file image.key
image.key: data
$ openstack secret store --algorithm aes --bit-length 256 --mode cbc \
--secret-type symmetric --file image.key --name restored-image-key
$ openstack secret list -f value -c "Secret href" -c "Name"
http://10.0.1.116/key-manager/v1/secrets/6ea6b0a8-de50-45b8-90b7-9470c4dd201a restored-image-key
$ export SECRET_ID=6ea6b0a8-de50-45b8-90b7-9470c4dd201a
$ openstack image create --file image.raw \
--property cinder_encryption_key_id=$SECRET_ID \
--property cinder_encryption_key_deletion_policy=on_image_deletion \
restored-image
$ openstack volume create --size 1 \
--image restored-image volume-restored-notype
$ openstack volume show -f value -c type volume-restored-notype
lvmdriver-1 # <- this is an *unencrypted* volume type! (which contains LUKS blocks now)
$ openstack volume create --size 1 \
--image restored-image --type lvmdriver-1-LUKS \
volume-restored-lukstype
$ openstack volume show -f value -c type volume-restored-lukstype
lvmdriver-1-LUKS
$ openstack server create \
--volume volume-restored-notype \
... server-from-untyped-volume
$ openstack server create \
--volume volume-restored-lukstype \
... server-from-luks-volume
This results in the server server-from-untyped-volume
being stuck at "No bootable device found" on the virtual console. Only server-from-luks-volume
where a LUKS volume type was explicitly chosen while restoring the LUKS image was bootable.
It seems that this is an oversight in Cinder in its current implementation?
I tested this with a simple encrypted volume to encrypted image to volume:
stack@devstack:~/devstack$ openstack image show encrypted-volume-image
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| checksum | 4f0538390cb7728f6e02d6a58a824e11 |
| container_format | bare |
| created_at | 2024-04-11T13:09:08Z |
| disk_format | raw |
| file | /v2/images/49febf2e-4f20-47f0-8230-660c2dcb19dc/file |
| id | 49febf2e-4f20-47f0-8230-660c2dcb19dc |
| min_disk | 0 |
| min_ram | 0 |
| name | encrypted-volume-image |
| owner | 15f2ab0eaa5b4372b759bde609e86224 |
| properties | cinder_encryption_key_deletion_policy='on_image_deletion', cinder_encryption_key_id='286339bc-5484-4a27-b24a-816f89a2968c', hw_rng_model='virtio', locations='[{'url': |
| | 'rbd://adbc1d67-adf7-4a13-94b5-2a2570d41ed9/images/49febf2e-4f20-47f0-8230-660c2dcb19dc/snap', 'metadata': {}}]', os_hash_algo='sha512', |
| | os_hash_value='7160b7451962496108194179757221dbbb5af7654b205a63cf6284a70dd1caed8ee289b98a56f0cb0d37e363adc18cc334d0f1bf6be937e1e9af9036009b0856', os_hidden='False', |
| | owner_specified.openstack.md5='', owner_specified.openstack.object='images/cirros-0.6.2-x86_64-disk', owner_specified.openstack.sha256='', signature_verified='False' |
| protected | False |
| schema | /v2/schemas/image |
| size | 3221225472 |
| status | active |
| tags | |
| updated_at | 2024-04-11T13:11:31Z |
| virtual_size | 3221225472 |
| visibility | shared |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
stack@devstack:~/devstack$ openstack volume create --size 3 --image encrypted-volume-image volume-from-LUKS-image
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2024-04-12T11:23:52.246101 |
| description | None |
| encrypted | False |
| id | f790ecd4-eea6-4a8d-8cf6-d1a038c91d60 |
| migration_status | None |
| multiattach | False |
| name | volume-from-LUKS-image |
| properties | |
| replication_status | None |
| size | 3 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| type | ceph |
| updated_at | None |
| user_id | 6cf194afebb6469e8423f50500b5c3fc |
+---------------------+--------------------------------------+
A seemingly unencrypted volume is created from the encrypted image. But as far as we know, there is no decrypting mechanism implemented in Cinder to go from such a Cinder-specific LUKS encryption in the image to an unencrypted volume.
We should definitely file a bug in Cinder for this one.
When creating a server from a volume, like my above case, it will be shown as active even though this cannot be true
stack@devstack:~/devstack$ openstack server list
+-----------------+-----------------+--------+-----------------+------------------+----------+
| ID | Name | Status | Networks | Image | Flavor |
+-----------------+-----------------+--------+-----------------+------------------+----------+
| 8ba1b529-d1d9- | will-this- | ACTIVE | private=10.0.0. | N/A (booted from | m1.small |
| 4f5b-a97d- | server-boot? | | 47, fd13:d046:e | volume) | |
| 6756f61451e0 | | | 727:0:f816:3eff | | |
| | | | :feae:b8e1 | | |
| 66f8f821-ec26- | my-new-server | ACTIVE | private=10.0.0. | N/A (booted from | m1.small |
| 4264-807e- | | | 41, fd13:d046:e | volume) | |
| 36ec016d51f9 | | | 727:0:f816:3eff | | |
| | | | :fe98:3e70 | | |
+-----------------+-----------------+--------+-----------------+------------------+----------+
To show the boot log of a server the following command can be used:
stack@devstack:~/devstack$ openstack console log show my-new-server > other-server.log
The log will then look something like:
[ 0.000000] Linux version 5.15.0-71-generic (buildd@lcy02-amd64-044) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0>
[ 0.000000] Command line: LABEL=cirros-rootfs ro console=tty1 console=ttyS0
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Hygon HygonGenuine
[ 0.000000] Centaur CentaurHauls
[ 0.000000] zhaoxin Shanghai
[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
[ 0.000000] signal: max sigframe size: 1440
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdcfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007ffdd000-0x000000007fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
.....
But if created from a server with an unencrypted volume from an encrypted image the log file remains empty:
stack@devstack:~/devstack$ openstack console log show 02d13a4c-a6a0-4b40-850c-9b7ddd13d4bb
<- log output should be here
stack@devstack:~/devstack$
I uploaded the new spec to gerrit: https://review.opendev.org/c/openstack/glance-specs/+/915726
I reported the bug uncovered in https://github.com/SovereignCloudStack/standards/issues/560#issuecomment-2051563285 at https://bugs.launchpad.net/cinder/+bug/2061154
I reported the bug uncovered in #560 (comment) at https://bugs.launchpad.net/cinder/+bug/2061154
I raised this in the IRC, the following was the answer:
<Luzi> eharney, while trying some things with the LUKS-image we found this behavior: https://bugs.launchpad.net/cinder/+bug/2061154
<Luzi> is this known?
<eharney> Luzi: i think this is done on purpose because we didn't want to have users accidentally decrypting volume images, but it might still be useful to let them copy the data around (rather than failing)
<eharney> it would make sense to document this behavior because it is kind of surprising in the standard use of volumes/images
<eharney> generally, people should make sure they are using an encrypted volume type
<Luzi> or maybe add a check for the a volume type with encryption type?
<eharney> i think we didn't want it to fail because there are situations (copying data around to different places rather than booting instances) where it's useful to allow this
We currently evaluate whether we will also need a spec for Cinder. What we definitely need is a blueprint for Cinder to track all development. I have written a blueprint and tried to include all possible points, where there will be implementation needed in Cinder: https://blueprints.launchpad.net/cinder/+spec/standardize-image-encryption-metadata From my opinion there should be a spec, to at least show all use cases where encrypted images need to be considered.
We got a review for the Spec (I am currently updating it) and there is one last question to discuss:
We wanted to introduce a new container format "encrypted" - so it would be easily visible to anyone, that this image is encrypted. To identify what the underlying format is, we introduced a property: "os_decrypt_container_format".
Now Nova and Cinder both would like to have the container format showing up in the original property and Cinder additionally would need some parameter to know whether the encrypted image is compressed or not.
The thing here is, that we could check whether encrypted images are always qcow or raw when the container_format and decrypt_container_format is set. The metadata could be set after creating an image in the upload step. So it may be allowed to create an encrypted image with a format neither cinder nor nova could use. We would like to avoid such bad user experience.
So we want to ask the following questions in the Glance meeting this thursday:
From the Cinder meeting today I got the wish to also create a small Cinder spec: https://etherpad.opendev.org/p/cinder-dalmatian-meetings
I created the Cidner spec today: https://review.opendev.org/c/openstack/cinder-specs/+/919499
We got reviews on the spec, and Markus and I are going through them and answering questions.
I adjusted the Glance spec and removed the container_format 'encrypted', because this was one of the most discussed parts of the spec: https://review.opendev.org/c/openstack/glance-specs/+/915726
I got feedback on the Glance patch with the question about what to do when the image conversion plugin is activated. It needs among other things this config adjustment: https://github.com/openstack/glance/blob/master/etc/glance-image-import.conf.sample#L26
Image conversion as I understand it, is not something that can be triggered by a CLI command. Instead when the plugin is activated ALL images that are created will be automatically converted to the ( through the config ) specified format after uploading and before storing it.
This creates a few questions regarding encrypted images:
As this optional feature of Glance may already have problems with 1, and we would need to at least forbid uploading encrypted images when the target format is vmdk (maybe also when the target format is qcow2) this would be a lot more of implementation work. So we will for now render this out of scope for the spec. Maybe this can be done after the image encryption is in place and if it is needed by operators and users.
I got feedback on the Glance patch with the question about what to do when the image conversion plugin is activated. It needs among other things this config adjustment: https://github.com/openstack/glance/blob/master/etc/glance-image-import.conf.sample#L26
Image conversion as I understand it, is not something that can be triggered by a CLI command. Instead when the plugin is activated ALL images that are created will be automatically converted to the ( through the config ) specified format after uploading and before storing it.
This creates a few questions regarding encrypted images:
1. Is this behavior also triggered when Nova or Cinder upload an image? If yes, then it will currently result in unusable images or worse: strange Errors. If not, this feature is not very well implemented, because in this case disk-formats will differ and not all be the same (as assumed by the user or the plugin). 2. The currently supported target formats are qcow2, raw and vmdk. We will only allow qcow2 and raw images to be encrypted and uploaded, because other ones cannot be used by Cinder and Nova. Converting qcow2 to raw is possible, vice-versa needs testing. But we do not support conversion to vmdk, because we don't know the behavior and whether transforming from encrypted qcow2/raw to encrypted vmdk would be possible with qemu AND neither Cinder nor Nova use the encrypted vmdk format. So we would have to convert again in Nova and Cinder.
As this optional feature of Glance may already have problems with 1, and we would need to at least forbid uploading encrypted images when the target format is vmdk (maybe also when the target format is qcow2) this would be a lot more of implementation work. So we will for now render this out of scope for the spec. Maybe this can be done after the image encryption is in place and if it is needed by operators and users.
Seems like this is part of the "Interoperable Image Import"^1. Which references the following methods^2:
This makes me wonder if this is applicable to images uploaded to Glance by Nova or Cinder at all. This is limited to images from external sources I think.
Nonetheless, at the latest when the user is initiating such an image upload we need to make sure that we account for this concerning the image encryption. I agree that we should make it out of scope for the initial contribution in any case. Improving compatibility with advanced features later on after getting the base work done is always an option imho.
I am looking into the Cinder Side a little bit more because, we got a comment from Dan Smith regarding image conversion and Glance checking a few Image parameters:
Again, you either need to call out that image conversion will not be possible on encrypted images, or say that glance will do those things if it has the key_id (and access to it). Also note that without access to the key, glance won't be able to do any sort of image inspection (for virtual_size, format confirmation, backing file rejection, etc). I think it's worth mentioning those drawbacks here.
The image conversion does only seem to be triggered when a user is uploading an image, not if Cinder or Nova upload one. Forbidding that would put more responsibility on the user side, but we also do that with the image encryption.
The check for the virtual size can be omitted, because we only allow 2 types of encrypted images: qcow2 and raw and we want to introduce the "os_decrypt_size" parameter that should discribe the size of the unencrypted image. We may even mandate using this parameter. The format for encrypted raw images is only checkable, when decrypting the image.
So while this is a valid point to discuss, as Glance will reject images, that do not have the format, they say they have, the Glance team did not wanted to have the power to encrypt or decrypt images back when we discussed gpg-based image encryption. I doubt that this has changed and i also do not see a good reason, Glance should be able to do this. Rather we should discuss, if these checks need to take place, and whether they could be moved to either the Client-side or better to Cinder and Nova.
I edited the glance spec to forbid image conversion in a more explicit way and included some more comments on the glance spec.
I adjusted the key-managing part of the spec to clarify the behavior of deleting keys and added a part to put images into ERROR state, when they are encrypted and image conversion is enabled and need to be done.
I attended the Glance meeting and asked for more reviews to the spec, so it can be merged.
I adjusted the glance spec as there were a few open questions: https://review.opendev.org/c/openstack/glance-specs/+/915726
There were a few more questions on a spec. One thing is, that with the standardization we would also OPTIONALLY allow users to set a parameter that advises glance to delete a key on image deletion. This might be something to discuss in a Glance meeting.
I alsoe edited the work items and added some more for tests, documentation and a migration script for old cinder-specific parameters.
qemu-img convert
Since Nova is using qcow2+LUKS for ephemeral storage and Cinder uses plain LUKS for volumes, we will have encrypted images in different formats originating from either services.
To achieve the interoperability in the image encryption, we need tooling to convert between them without root-level permissions. qemu-img convert
is the way to go here as recommended by the Nova team.
Thankfully, Nova's implementation^1 is a good blueprint on how to use qemu-img convert
with encrypted source files (e.g. qcow2+LUKS).
# prepare example image
wget https://download.cirros-cloud.net/0.6.2/cirros-0.6.2-x86_64-disk.img
qemu-img convert -O raw cirros-0.6.2-x86_64-disk.img cirros.raw
# create QCOW2+LUKS encrypted image
echo "muchsecretsuchwow" > secret_file.key
export INPUT_FILE="cirros.raw"
qemu-img convert -f raw -O qcow2 \
--object secret,id=sec,file=secret_file.key \
-o encrypt.format=luks \
-o encrypt.key-secret=sec \
-o encrypt.cipher-alg=aes-256 \
-o encrypt.cipher-mode=xts \
-o encrypt.hash-alg=sha256 \
-o encrypt.ivgen-alg=plain64 \
-o encrypt.ivgen-hash-alg=sha256 \
$INPUT_FILE "$INPUT_FILE.luks.qcow2"
# convert QCOW2+LUKS to plain LUKS and write to block device (/dev/sda)
# NOTE1: instead of providing `-f qcow` and the input filename as positional
# argument, the `--image-opts` is used which also supports the
# necessary encryption parameters for source file encryption
# NOTE2: the `-o` parameters refer to the target LUKS encryption instead
# NOTE3: if encryption keys should differ, two `--object secret,id=...`
# arguments with different ids referenced as key-secret are required
qemu-img convert -O luks \
--object secret,id=sec,file=secret_file.key \
--image-opts "driver=qcow2,file.driver=file,file.filename=$INPUT_FILE.luks.qcow2,encrypt.key-secret=sec" \
-o key-secret=sec \
-o cipher-alg=aes-256 \
-o cipher-mode=xts \
-o hash-alg=sha256 \
-o ivgen-alg=plain64 \
-o ivgen-hash-alg=sha256 \
/dev/sda # this could also be a file path
# mount the CirrOS partition to verify
cryptsetup luksOpen --key-file secret_file.key /dev/sda mapping-sda
# the resulting /dev/mapper/mapping-sda will be the CirrOS disk which has
# multiple volumes; use `fdisk -l` and multiply the start offset by 512
fdisk -l /dev/mapper/mapping-sda
# redirect the partition from the calculated offset (512 times the start block)
# to a loop device for mounting
losetup -o 9437184 /dev/loop0 /dev/mapper/mapping-sda
mount /dev/loop0 /mnt
Further notes:
-o
and --image-opts
:
key-secret=sec
encrypt.
prefix, e.g., encrypt.key-secret=sec
--image-opts
, for target encryption use -o
-f qcow2
, they need --image-opts
instead--image-opts
is used, the source file path positional argument is omitted and the file path is specified in --image-opts
directly as a driver parameter; this is mandatory--object secret,id=...
for each keySince special qcow2 functionalities are the source of the recent OSSA-2024-001 vulnerability, we will also be affected if we opt to use qemu-img convert
as illustrated above.
The mitigation added by OpenStack is implemented as an image format inspection in all relevant services (Glance, Cinder, Nova). We can re-use this.
As an example, consider the "cirros.raw.luks.qcow2" qcow2+LUKS file created above and the format_inspector.py
as implemented in Cinder:
>>> from format_inspector import QcowInspector
>>> i = QcowInspector.from_file("cirros.raw.luks.qcow2")
>>> i
<format_inspector.QcowInspector object at 0x7068ea532900>
>>> i.has_header
True
>>> i.has_backing_file
False
>>> i.has_unknown_features
False
>>> i.safety_check()
True
The qcow2+LUKS file created by qemu-img convert
only has the image payload encrypted, the qcow2 header is in plaintext. This enables analysis by the format inspection. Hence we can re-use the inspector implementation as-is and check the image in a step before we attempt the image re-encryption when converting from qcow2+LUKS to LUKS in our implementation.
I was able to discuss this also with the Glance team in the Glance channel and meeting today, and they and fungi do not see a problem, when the metadata is not encrypted. We can continue with our effort on standardizing the image encryption upstream
Glance spec is finally merged now: https://review.opendev.org/c/openstack/glance-specs/+/915726
The cinder Spec was also merged: https://review.opendev.org/c/openstack/cinder-specs/+/919499.
The following steps need to be implemented: Glance:
Cinder:
I spoke with Markus, about all steps that need to be implemented or written (Glance documentation and security guide). We added some steps. Markus may start implementing, when I am on vacation.
I started implementing the corresponding patchsets.
TODO:
cinder_key_*
) to new (os_encrypt_*
)TODO:
cinder_encryption_*
to os_encrypt_*
binascii.hexlify()
into os-brickos_encrypt_format
and os_encrypt_cipher
to encrypted volume images (os-volume_upload_image
action)ExtractVolumeRequestTask
flow that triggers when a source image has a key idvolume_utils.clone_encryption_key
?)TODO:
binascii.hexlify()
for native LUKS attachment in libvirt driver by call to os-brick librarynova/nova/privsep/libvirt.py:dmcrypt_create_volume()
and its use of binascii.hexlify()
TODO:
binascii.hexlify()
useSo far I've been able to verify the following functionality of the implementation:
qemu-img convert
and openstack image create
as well as --property
s with the new metadata namingopenstack volume create --image
request if the image is encrypted but the volume type is notopenstack image create --volume
and store additional format/cipher attributes based on the volume type cipher settingsUsing a non-passphrase type secret works with the changes when making sure to hexlify the secret befor using qemu-img convert
to create the LUKS image. This way Cinder can consume it just like it would one of its own volume images.
This is however very finicky for users like shown below.
echo -n "muchsecretsuchwow" > secret_file.key
# it is important that the file does not end with a newline!
python3 -c "import binascii; print(binascii.hexlify('$(cat secret_file.key)'.encode('utf-8')).decode('utf-8'), end='')" > secret_file.hex
# secret_file.key <- plaintext passphrase
# secret_file.hex <- hexlified passphrase
# store the secret in plaintext
openstack secret store --file secret_file.key --secret-type symmetric ...
# encrypt the image file using the hexlified passphrase
qemu-img convert -f raw -O luks --object secret,id=sec,file=secret_file.hex ... IMG_FILE
openstack image create --file IMG_FILE --property os_encrypt_key_id= ...
Cinder will later retrieve the Barbican secret and hexlify it to use it as a passphrase. This works already.
It is much more convenient for users to skip the hexlify step and use --secret_type passphrase
which will trigger Cinder to skip the hexlify process and use the secret as passphrase directly.
This is what I intended with the patchset drafted above but it turns out that we also need to adjust os-brick^1 here because when the volume is later attached by Nova it uses os-brick and os-brick also implements implicit hexlify of secrets.
os-brick needs to be patched to differentiate between secret types in the same fashion as Cinder.
UPDATE: Nova also needs to be patched, because it does the hexlify itself in the libvirt driver in case of native LUKS attachment through QEMU: https://github.com/openstack/nova/blob/bb2d7f9cad577f3a32cb9523e2b1d9a6d6db3407/nova/virt/libvirt/driver.py#L2193-L2196
Since we now need the same code (differentiating between secret types and hexlifying on demand) at three places (Cinder, Nova, os-brick) I think it would be best to de-duplicate the code into os-brick and import it from os-brick in all other services.
EDIT: Updated https://github.com/SovereignCloudStack/standards/issues/560#issuecomment-2263033722 with corresponding patchset drafts for Nova and os-brick.
Using my Cinder patchset draft as implemented so far, I tested how and where qcow2+LUKS images would be handled using two test cases, one for non-hexlify and one for hexlify:
Both test cases currently exhibit the same error pattern as quoted below:
...
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 1127, in _create_from_image
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server model_update = self._create_from_image_cache_or_download(
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 1006, in _create_from_image_cache_or_download
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server model_update = self._create_from_image_download(
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 805, in _create_from_image_download
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server volume_utils.copy_image_to_volume(self.driver, context, volume,
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/volume/volume_utils.py", line 1226, in copy_image_to_volume
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server raise exception.ImageCopyFailure(reason=ex.stderr)
Aug 12 10:17:43 devstack cinder-volume[616075]: ERROR oslo_messaging.rpc.server cinder.exception.ImageCopyFailure: Failed to copy image to volume: qemu-img: Could not open ... Parameter 'encrypt.key-secret' is required for cipher
Their conversion is attempted by volume_utils.copy_image_to_volume()
.
The implementation currently seems to correctly identify the qcow2 image and attempts to convert it but is unable to process it because it contains the LUKS data for which no key is provided by the function call.
Problem with this function is that it calls driver.copy_image_to_volume()
which is driver-specific in its implementation.
Taking the rbd driver as an example, this would lead to the following call chain:
rbd.RBDDriver.copy_image_to_volume()
rbd.RBDDriver._copy_image_to_volume()
image_utils.fetch_to_raw()
image_utils.fetch_to_volume_format()
image_utils.convert_image()
Judging from this example, the best place for implementing the flattening/re-encryption of qcow2+LUKS images seems to be image_utils.convert_image()
.
However, we need to make sure that this is handled equally by all possible Cinder drivers.
Most Cinder backend drivers seem to call image_utils.fetch_to_*
in their copy_image_to_volume()
code path.
The only drivers that don't are:
fcd.VMwareVStorageObjectDriver
vmdk.VMwareVcVmdkDriver
We'll either need to take care of them individually or insert safeguards that prevent them from consuming qcow2+LUKS images in case we are unable to provide a suitable implementation for them.
UPDATE: I adjusted image_utils.fetch_to_volume_format()
in a way that triggers image_utils.convert_image()
to convert qcow2+LUKS to LUKS. Fortunately, it turned out that support for this was already existing in image_utils.convert_image()
and I only needed to craft its call accordingly by passing the keys.
Added to https://github.com/SovereignCloudStack/standards/issues/560#issuecomment-2263033722
I uploaded all patchsets upstream now effectively deprecating the drafts in https://github.com/SovereignCloudStack/standards/issues/560#issuecomment-2263033722:
The gerrit topic label is LUKS-image-encryption
: https://review.opendev.org/q/topic:%22LUKS-image-encryption%22
Work should continue there.
@josephineSei the Glance spec patchset at https://review.opendev.org/c/openstack/glance-specs/+/915726 seems to have been added to the old "image-encryption" topic label. Can you change it to "LUKS-image-encryption"? That's where the new Cinder Spec and the above patchsets now reside for the image encryption rework.
binascii.hexlify()
outside of that; maybe handled by os-brick encryptors?)os_encrypt_*
os_decrypt_size
and os_decrypt_container_format
on images where applicableAs discussed with @josephineSei today, there is an open question to be discussed with the OpenStack community on how to handle the transition from the old Glance image metadata naming (cinder_encryption_*
) to the new one (os_encrypt_*
).
In the current patchset I simply replaced the old names. However, this strictly requires Glance and Cinder to be in sync in terms of version for this to work properly. If the components are deployed in different release versions for an arbitrary time frame, the image encryption used by Cinder might be impacted:
*_key_deletion_policy
is used to delete Barbican secrets; if Glance and Cinder use different namings here, secret is not properly deletedcinder_encryption_*
, secret consumers are not registered as the image is not detected as encrypted by GlanceIt may be advisable to introduce a deprecation period and make Glance and Cinder still accept the old metadata names when parsing image metadata.
I wrote a mail to the OpenStack discuss ML to invite Cinder and Glance people to a discussion on this topic next monday on irc.
Metadata naming deprecation/transitioning
As discussed with @josephineSei today, there is an open question to be discussed with the OpenStack community on how to handle the transition from the old Glance image metadata naming (
cinder_encryption_*
) to the new one (os_encrypt_*
).In the current patchset I simply replaced the old names. However, this strictly requires Glance and Cinder to be in sync in terms of version for this to work properly. If the components are deployed in different release versions for an arbitrary time frame, the image encryption used by Cinder might be impacted:
* `*_key_deletion_policy` is used to delete Barbican secrets; if Glance and Cinder use different namings here, secret is not properly deleted * if Cinder still uses `cinder_encryption_*`, secret consumers are not registered as the image is not detected as encrypted by Glance
It may be advisable to introduce a deprecation period and make Glance and Cinder still accept the old metadata names when parsing image metadata.
We discussed the topic today with upstream (Logs are here: https://meetings.opendev.org/meetings/image_encryption/2024/image_encryption.2024-08-19-13.00.log.html) and will do the following:
cinder_encryption_*
metadata names for one cycle in parallel to the new names and deprecate themcinder_encryption_*
metadata names to the new os_encrypt_*
namesI will also write another mail to the ML like rosmaita suggested. To ask ops whether they are okay with the upgrading workflow of:
cinder_encryption_*
when those appear in their deploymentsI am writing said extension, but currently having problems testing it with my devstack:
stack@devstack:~/devstack$ glance-manage db version
CRITICAL glance [-] Unhandled error: sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.plugins:dbcounter
This Error does not occur with cinder:
stack@devstack:~/devstack$ cinder-manage db version
INFO dbcounter [-] Registered counter for database cinder
DEBUG oslo_db.sqlalchemy.engines [-] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_ENGINE_SUBSTITUTION {{(pid=304079) _check_effective_sql_mode /opt/stack/data/venv/lib/python3.10/site-packages/oslo_db/sqlalchemy/engines.py:342}}
DEBUG dbcounter [-] [304079] Writer thread running {{(pid=304079) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:102}}
INFO alembic.runtime.migration [-] Context impl MySQLImpl.
INFO alembic.runtime.migration [-] Will assume non-transactional DDL.
a8189e5afd9a
This seems to be a devstack specific bug, as the dbcounter
is a devstack internal plugin (https://lists.openstack.org/pipermail/openstack-discuss/2023-May/033734.html)
I am currently debugging this - and my need help from the upstream glance team.
Or I need another test-deployment, that is NOT a devstack to test my changes. BUT zuul from opendev also uses devstack to test everything (tox, tempest, etc... is all running on devstack instances with your code patch applied.) - so it would be better to investigate this, I think.
I asked in IRC and on the ML, I hope I get an answer. (https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/JOCR5V26IPW6IJIUJIAVXPFXJEPXOPFU/)
We discussed the topic today with upstream (Logs are here: https://meetings.opendev.org/meetings/image_encryption/2024/image_encryption.2024-08-19-13.00.log.html) and will do the following:
- update the patches to allow
cinder_encryption_*
metadata names for one cycle in parallel to the new names and deprecate them
I revisited the patchsets for Glance and Cinder and added support for cinder_encryption_*
metadata attributes back in, serving as a fallback to the new values:
I added deprecation notes in the source code (or naming of the unit test methods) accordingly so that all places where stuff needs to be removed after the deprecation period can be easily identified.
I got an answer to my problem on the mailing list and can now continue implementation and testing.
We attended the Cinder meeting in IRC today and got attention on the os-brick patch. os_brick is a library that will have its own release in one or two days. Having people reviewing the patch, will give us the chance to target the image encryption for the 2024.2 release. https://etherpad.opendev.org/p/cinder-dalmatian-meetings
I adjusted the patchsets to fix failing tests for os-brick and Glance; os-brick is now verified by Zuul, Glance tests are still pending, Glance tests now also succeed.
The os-brick patch now got marked as release-critical for 2024.2 after we attended the Cinder IRC meeting today.
I somehow still have problems with the migration script : it does not seem to be triggerd by glance-manage. I still put my code in an upstream patch as WIP: https://review.opendev.org/c/openstack/glance/+/926905
I somehow still have problems with the migration script : it does not seem to be triggerd by glance-manage. I still put my code in an upstream patch as WIP: https://review.opendev.org/c/openstack/glance/+/926905
I tried another few things, but at this point I would either need another deployment to test or some help from upstream - I already asked in the glance channel and the patch got review priority.
I raised attention again for all the patches, in the irc glance channel and in the pop-up team meeting.
Cinder and Glance patchsets have been updated to only deprecate cinder_encryption_key_id
and cinder_encryption_key_deletion_policy
but still support them in addition to the new names for the deprecation period as requested by upstream.
I fixed some errors in the glance db data-migration patch. And looked through the Zuul logs for the Nova and Cidner patches. All remaining tox-errors are already solved through the merge of the os-brick patch. Maybe this will also solve the tempest errors. I rechecked the nova patch, and we can do the same with the Cinder patch, if this solves the issue.
Upstream, the Cinder Tempest tests in Zuul currently fail during volume cloning of encrypted volumes^1:
cinder_tempest_plugin.scenario.test_volume_encrypted.TestEncryptedCinderVolumes.test_boot_cloned_encrypted_volume[compute,id-5bb622ab-5060-48a8-8840-d589a548b7e4,image,volume]
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Captured traceback:
~~~~~~~~~~~~~~~~~~~
...
File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/cinder_tempest_plugin/scenario/test_volume_encrypted.py", line 162, in test_boot_cloned_encrypted_volume
waiters.wait_for_volume_resource_status(
File "/opt/stack/tempest/tempest/common/waiters.py", line 372, in wait_for_volume_resource_status
raise exceptions.VolumeResourceBuildErrorException(
tempest.exceptions.VolumeResourceBuildErrorException: volume 4bda115c-45e1-4785-a4f9-61f0a09cbe25 failed to build and is in ERROR status
Source code of the Tempest test: https://github.com/openstack/cinder-tempest-plugin/blob/1aa0a56fca85ce23343950431e28f41c4fa811c5/cinder_tempest_plugin/scenario/test_volume_encrypted.py#L150-L157
I started debugging this on my DevStack but so far haven't been able to reproduce a volume in ERROR state when cloning an existing encrypted one.
I happened to notice one difference though:
None
I don't think cloning the secret name without any suffix (1) is desired at all because this is confusing for users when their image secret appears multiple times in Barbican by name. I think we should change this.
This also seems to hint at differences of secret clone handling between (1) and (2).
I will keep investigating.
The recheck for the Nova Patch led to a green zuul pipeline. I get two Errors in tempest in the Glance db migration patch.
==============================
Failed 2 tests - output below:
==============================
tempest.api.image.v2.test_images_formats.ImagesFormatTest.test_accept_reject_formats_import[id-7c7c2f16-2e97-4dce-8cb4-bc10be031c85](vmdk-monolithicFlat-leak)
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Captured traceback:
~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "/opt/stack/tempest/tempest/api/image/v2/test_images_formats.py", line 148, in test_accept_reject_formats_import
self.client.delete_image(image['id'])
.....
tempest.api.image.v2.test_images_formats.ImagesFormatTest.test_accept_reject_formats_import[id-7c7c2f16-2e97-4dce-8cb4-bc10be031c85](vmdk-sparse-with-footer)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Captured traceback:
~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "/opt/stack/tempest/tempest/api/image/v2/test_images_formats.py", line 148, in test_accept_reject_formats_import
self.client.delete_image(image['id'])
....
Both should not be related to the migration patch. I still wait for feedback from the Glance team, because the migration is not triggered on devstack.
I don't think cloning the secret name without any suffix (1) is desired at all because this is confusing for users when their image secret appears multiple times in Barbican by name. I think we should change this.
I adjusted the Cinder patchset to remove the name when cloning a secret.
We got a review on the Glance patchset requesting the following changes:
castellan_exception.Forbidden
branch of secret consumer registrationValueError
branch of secret consumer registrationI'll look into those.
Currently there is the possibility in Cinder to encrypt volumes and in Nova to use qcow2 encrypted images (still under development). Both can lead to and use LUKS-encrpyted images, but those are different and not aligned:
@markus-hentsch also found out in https://github.com/SovereignCloudStack/standards/issues/541 that uploading a LUKS-encrypted image (that was created from a volume) to another cloud in combination with setting a few parameteres (cinder_encryption_key, etc...) will result in an image that can be used to create an encrpyted and functional volume.
As a user it would be good to have a streamlined operation to use encrypted images in openstack for both volumes and ephemeral storage and to also allow interoperability between clouds. Therefore we need and will propose standardized parameters to describe and detect an encrypted image, which might be similar to the parameters described here: https://specs.openstack.org/openstack/cinder-specs/specs/zed/image-encryption.html But will use the LUKS encryption. So those encrypted images could be natively mounted in Nova or just formed into a volume (raw LUKS images can be directly used, qcow images need to be flattened).
With such a way encrypted backup images can be easily downloaded and transferred to another cloud.
This is a result from a lengthy discussion at the PTG with Nova, Cinder and Glance ( https://etherpad.opendev.org/p/dalmatian-ptg-cinder#L376 )
Followup tasks may be to implement re-encryption to fully change keys for LUKS volumes and images.