Closed Apokleos closed 5 months ago
@elezar @zvonkok @bart0sh
I need to read this carefully and check if this would also have some implications for the KEP: https://github.com/kubernetes/enhancements/pull/4113
@zvonkok Thx for your reply. I am looking forward to discussing this with you.
cc @pohly
There's also https://github.com/kubernetes/enhancements/pull/2893 which didn't get merged.
There's also kubernetes/enhancements#2893 which didn't get merged.
Thx @pohly I would like to confirm the relationship between KEP-2893 and the issue I raised. And I am concerned that the KEP-2893 does not seem to include support for kata direct-volume. Is my understanding correct?
I don't know (or remember).
@elezar @pohly @zvonkok I would appreciate your feedback on supporting the modification of kata direct volume types through CDI. Do you have any suggestions for optimizing and improving this proposal?
For clarification @Apokleos. Is the use of the device name to encode data a requirement, or is this just how you are implementing this in our current setup?
One issue with the code implemented above is that an unresolvable device is not treated as an error. This breaks assumptions around such devices. How would you propose missing / incorrect device specifications be passed to the user if this is not considered an error?
Thx @elezar for your feedback.
For clarification @Apokleos. Is the use of the device name to encode data a requirement, or is this just how you are implementing this in our current setup?
In this scenario, the Mount HostPath will be used as the device name. However, the HostPath may contain special characters, which are not allowed in the CDI naming specification. To comply with the CDI device name specification, the HostPath was encrypted using base64, even though this may result in special characters after encryption. This was done to maintain consistency with the encryption method used for volume paths in kata direct volumes.
One issue with the code implemented above is that an unresolvable device is not treated as an error. This breaks assumptions around such devices. How would you propose missing / incorrect device specifications be passed to the user if this is not considered an error?
Yes, this is indeed a problem. However, I have a different understanding of
unresolved devices
after deep diving the code. Do devices not found in the CDI cache belong to unresolved devices? (As the code ofInjectDevices
indicates that devices not found in the CDI cache belong to unresolved devices. ) IMO, devices not found in the CDI cache are filtered devices, while devices hit cache but failed to be applied are the unresolved devices.
I propose to split InjectDevices
into two steps: filtering
and injecting
.
FilterDevices
. In this step, if devices hit the CDI Cache, they will be the ones we want. filtering
step will be passed to the injecting
method InjectDevices
. If a device fails to be applied, it will be considered as an unresolved device.And the FilterDevices
can be placed in RegistryResolver
as below:
type RegistryResolver interface {
InjectDevices(spec *oci.Spec, device ...string) (unresolved []string, err error)
+ FilterDevices(device ...string) (mached []string, err error)
}
Hi @elezar I'm interested in hearing your thoughts on the updated proposal. I'd like to discuss it further with you and look forward to hearing from you.
Thx.
Hi @klihub, Could I hear your suggestions for this proposal? Thx
Hi @klihub, Could I hear your suggestions for this proposal? Thx
My biggest problem with this, in its current form, is that we end up treating unresolvable CDI devices during injection as something to be merely logged instead of flagged as a fatal error. I don't think that is something we'd find an acceptable compromise.
Looking at your (very nice and detailed) description, I gather that the basic problem is roughly
And this proposal (and I guess your earlier hack attempt to the NRI device injector sample plugin to modify mount options) tries to come up with a way to 'patch up' the incorrectly/insufficiently created mount for the container after the fact.
@Apokleos Is this summary correct ?
Thank you @klihub for your valuable time. I appreciate your understanding of my proposal. I would like to discuss the unresolvable CDI devices with you further.
Hi @klihub, Could I hear your suggestions for this proposal? Thx
My biggest problem with this, in its current form, is that we end up treating unresolvable CDI devices during injection as something to be merely logged instead of flagged as a fatal error. I don't think that is something we'd find an acceptable compromise.
@elezar has the same concern about the unresolvable CDI devices in this proposal, and I updated the proposal What do you think of the new updates?
Looking at your (very nice and detailed) description, I gather that the basic problem is roughly
- there is a CSI driver that tries to create/serve/handle storage volumes for Kata containers
- naturally we can't inject storage into a Kata container by simply bind-mounting a directory
- yet, with the current infrastructure (K8s, CRI, runtime) there is no way to convey enough information, so we end up with a simple bind mount
And this proposal (and I guess your earlier hack attempt to the NRI device injector sample plugin to modify mount options)
Yes, you're right. I try to do "patch up" the Mount's type with NRI annotation when I do some tests for kata/direct-volume with NRI. And
tries to come up with a way to 'patch up' the incorrectly/insufficiently created mount for the container after the fact.
@Apokleos Is this summary correct ?
Thx @elezar @klihub I appreciate your time to discuss this proposal with me. I have revised it based on your concerns and submitted a PR to clarify it further.
To avoid violating the original design while supporting Mount, I added a dedicated method UpdateMounts
to handle Mounts. This allows for unresolved behavior to be allowed in Mount scenarios without errors.
@Apokleos, @marquiz and I are working on this KEP: https://github.com/kubernetes/enhancements/pull/4113, especially for the Kata use-case. To summarize your issue, we need additional meta information passed down the CRI. Rather then "hacking" something into CDI I suggest we tackle the issue at the source if possible. @Apokleos Can you please review the KEP and add a comment about your use-case and what is missing?
@Apokleos, @marquiz and I are working on this KEP: kubernetes/enhancements#4113, especially for the Kata use-case. To summarize your issue, we need additional meta information passed down the CRI. Rather then "hacking" something into CDI I suggest we tackle the issue at the source if possible. @Apokleos Can you please review the KEP and add a comment about your use-case and what is missing?
Hi @zvonkok Thx for your suggestion for the KEP-4113. I have added some comments about my concern. IMO, the best way to fix this is to provide a way to pass down the
Mount Type
corresponding to the direct volume, used to build the oci Spec Mount, instead of the default specified type bind.
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
This issue was automatically closed due to inactivity.
In the Kubernetes/CSI scenario, users need to use CSI to provide storage volumes to regular containers. After CSI completes the relevant operations, kubelet provides a HostPath and ContainerPath, but does not include a Mount Type. In other words, the Container Mount information that Containerd can retrieve through CRI does not include a type. When converting to the OCI Spec, the Mount type is set to "bind" by default.
Unfortunately,this solution does not work properly in the Kata Containers 3.0/runtime-rs with Direct Volume scenario. When using a direct volume, Kata Pod needs to determine the type of DirectVolume based on Mount.Type after obtaining mount information. Then, it needs to choose the appropriate way to insert the backend device into the Guest, such as directvol, vfiovol, or spdkvol. Finally, the mount operation is completed in the Guest.
Kata3.0/runtime-rs using direct volume with containerd ctr, more details can be seen how-to-run-kata-containers-with-kinds-of-Block-Volumes An example as below:
However, there are certain differences between ordinary containers and Kata Containers when using CSI to access storage. The Mount Type in the OCI Spec provided by K8S/Containerd is by default the bind type. Ordinary containers will use the bind type by default, but Kata/DirectVolume requires a specific type, such as "directvol" (or other types like vfiovol or spdkvol). Therefore, in the K8S/CSI scenario, Kata Containers wants the Mount in its OCI Spec to contain a specific volume type instead of the bind type when creating a container that uses a Direct Volume. In other words, it is necessary to change the default bind type to the specific volume type at the right time, and then pass it to the Kata runtime-rs.
Although CDI can edit the OCI Spec of Container third-party devices in K8S/DevicePlugin using Container Annotation/CDIDevice information, it can also be extended to CSI to increase the editing ability of Container Mount.
In the K8S/CSI/Kata-runtime-rs scenario using Direct Volume, we hope that CDI can automatically edit a Mount in the OCI Spec according to the Direct Volume type, that is, to change the original "bind" to the "directvol" of the corresponding volume_type in the mountInfo.json instance.
According to the idea, I have already made the initial design and implementation of it, the scheme is as follows.
direct.volume/direct-volume
. (1) Add the parameter CDIMounts []*runtime.Mount to the WithCDI function. (2) Use the base64-encoded string of Mount.HostPath as the deviceName to build a mount device that conforms to the CDI specification.The cdi config for Mount as below:
// WithCDI updates OCI spec with CDI content
func WithCDI(annotations map[string]string, CDIDevices []*runtime.CDIDevice) oci.SpecOpts {
func WithCDI(annotations map[string]string, CDIDevices []runtime.CDIDevice, CDIMounts []runtime.Mount) oci.SpecOpts { return func(ctx context.Context, client oci.Client, c containers.Container, s oci.Spec) error { seen := make(map[string]bool) // Add devices from CDIDevices CRI field var devices []string var err error ... +for _, mount := range CDIMounts {
// encoded mount path
encodedMntpath := DoEncodeBase64(mount.HostPath)
mntDevice := "direct.volume/direct-volume=" + encodedMntpath
if seen[mntDevice] {
log.G(ctx).Debugf("Skipping duplicated CDI device %s", mntDevice)
continue
}
devices = append(devices, mntDevice)
seen[mntDevice] = true +} ...
return oci.WithCDIDevices(devices...)(ctx, client, c, s) } }
CDI
(1) To better support using the base64-URLencoded string of the Mount HostPath as the DeviceName, it is necessary to modify the CDI DeviceName naming check rules to support special symbol
=
.(2) As the runtime.Mount object in containerd does not have any additional information to help accurately filter out which HostPath is associated with the direct volume, all Mount.Hostpath objects are allowed to be used to construct DeviceNames and passed to CDI. Then it uses DeviceNames to match cdi Cache database devices. However, the final result is that there is only one Mount that needs to be modified for the direct volume. Therefore, the CDI InjectDevice function needs to be modified to not return an error if there is an unresolved device name.