Closed AObuchow closed 3 weeks ago
@AObuchow I have the same issue but that's not related to $HOME
not being writable, in my case, the second container is distroless so it doesn't have sh
or true
. Any postStart event will fail in that container. But the init-ssh-agent-command
is injected in both containers.
Some ideas:
mountSources: false
there should be no need to add the init-ssh-agent-command-...
postStart eventThat's my DevWorkspace with 2 containers:
apiVersion: workspace.devfile.io/v1alpha2
kind: DevWorkspace
metadata:
name: outyet-dw
spec:
contributions:
- components:
- container:
env:
- name: CODE_HOST
value: 0.0.0.0
name: che-code-runtime-description
name: che-code
uri: https://eclipse-che.github.io/che-plugin-registry/main/v3/plugins/che-incubator/che-code/latest/devfile.yaml
started: true
template:
components:
- container:
image: ghcr.io/l0rd/outyet-dev:latest
memoryRequest: 2G
memoryLimit: 10G
cpuRequest: '1'
cpuLimit: '4'
mountSources: true
name: dev
- container:
cpuLimit: 500m
image: ghcr.io/l0rd/outyet@sha256:b83e158687d6cb3d7ae46382be1e4fbb8eb3572f3423a9c3c9beae6cd55cc0e8
memoryLimit: 128Mi
mountSources: false
name: outyet
projects:
- git:
remotes:
origin: https://github.com/l0rd/outyet.git
name: outyet
sourceType: Git
@vinokurig, your PR does not fix the use case above. I would appreciate it if you could address that as well.
@l0rd
- There should be a documented attribute to opt-out from automatically injected post start events
I think this is not related to the current issue and can be done separately.
- If there are multiple components the event should be added for the first component only
We can not be sure that the dev component will always be first in the list.
- If mountSources: false there should be no need to add the init-ssh-agent-command-... postStart event
Why not to handle the ssh passphrase for such components?
@l0rd Thank you for the feedback on this issue, it's always appreciated to see how you're using DevWorkspace Operator :)
In general, I think your issue is outside the scope of the current issue (as it seems more related to postStart events in general, rather than the DWO-injected SSH agent initialization postStart event). I'll go ahead and create a new DWO issue for your topic.
I agree with @vinokurig's responses to your suggestions. An opt-out attribute for automatically injected postStart events would probably be best for container images where running postStart events is impossible.
I inspected the ghcr.io/l0rd/outyet:latest
image (which is based on golang:onbuild
) and saw that it does have sh
but it's located in /usr/bin/sh
instead of /bin/sh
which is what DWO expects.
This suggests another improvement to DWO's postStart events: maybe we can use command
(which is POSIX-compliant) to determine how to invoke sh
if is available on $PATH?
Edit: Nevermind... I don't think it's feasible to use command -v sh
to test whether sh
is available before executing the rest of the postStart event. However, maybe we can just invoke sh
and have it be found on $PATH instead of hard-coding the location of sh
?
Unfortunately, if someone uses a scratch
image that does not even have command
, the postStart event will still cause the container to fail. In this case, the best solution would probably be to allow users to opt-out of the DWO-injected postStart events.
We can not be sure that the dev component will always be first in the list.
@vinokurig The convention is to consider the first component the dev component. We contribute the editor in the first component, not in every component. There should be an attribute to specify a component that is not the first one as the dev component.
Why not to handle the ssh passphrase for such components?
@vinokurig isn't the SSH passphrase mainly used for git authentication? If that's not you can ignore my comment
I inspected the
ghcr.io/l0rd/outyet:latest
image
@AObuchow you have inspected the wrong image. In the meantime I have changed ghcr.io/l0rd/outyet:latest
base image to make my scenario work (it used to work and that's what I am going to demo at KubeCon in a couple of weeks so I fixed it with a bigger but working image). But I had updated the example above with the distro-less image ghcr.io/l0rd/outyet@sha256:b83e158687d6cb3d7ae46382be1e4fbb8eb3572f3423a9c3c9beae6cd55cc0e8
.
In general I think that #1307 was a bad idea because there is a high risk to break existing workspaces (we just found 2 container images where the command failed, there may be more). And no matter the value of the new feature implemented, avoiding breaking existing workspaces should be the priority. Also we should avoid, at the DWO level, to modify a DW applied by a user. The dashboard does an automatic generation of DW, so why not adding the post start event there?
@l0rd got a question about https://github.com/devfile/devworkspace-operator/issues/1337#issuecomment-2450601646 is it reproducible for you even without SSH being configured?
@ibuziuk I am reproducing it on developer sandbox, without any specific config on my side.
@l0rd @ibuziuk @AObuchow To summarize: from the discussion I see next issues:
$HOME
directory is not writable, the related pull request addresses the particular issue.sh
might not be available. This might affect all workspaces regardless the presence of the ssh key configmap.init-ssh-agent
post start event to all containers except the first one. I believe that [2] and [3] issues should be addresses in a separate pull requests. I am going to create an issue for [2] but I am not sure if we need one for [3]?
@vinokurig sure, I am perfectly fine with a separate PR, of course.
@l0rd @ibuziuk @AObuchow To summarize: from the discussion I see next issues:
1. `$HOME` directory is not writable, the related pull request addresses the particular issue. 2. `sh` might not be available. This might affect all workspaces regardless the presence of the ssh key configmap. 3. Do not add the `init-ssh-agent` post start event to all containers except the first one.
I believe that [2] and [3] issues should be addresses in a separate pull requests. I am going to create an issue for [2] but I am not sure if we need one for [3]?
@vinokurig Thank you for summarizing the current state of things.
I am going to create an issue for [2]
I'm happy to create an issue for this since it's past your working hours in your current timezone. I will also start investigating potential solutions to solve this issue.
Regarding [3]:
Do not add the init-ssh-agent post start event to all containers except the first one.
Though I agree with @l0rd's comment that the convention is to consider the first component as the main tooling component:
The convention is to consider the first component the dev component. We contribute the editor in the first component, not in every component. There should be an attribute to specify a component that is not the first one as the dev component.
We had an issue CRW-6614 where on the DevSandbox the first component in the devfile was not always the tooling component. I'm still confused as to why this was the case, as no reproducer devfile was left in that Jira and there's only screenshots that show this repo which does have the tooling container as the first component.
However: when an SSH key & passphrase is provided via the automount configmap/secret mechanism, it is mounted to all container components (not just the tooling component). Thus, it seems natural to also allow for automatic SSH passphrase provisioning with the ssh-agent in all container components.
So I do not think we should only provide the ssh-agent postStart event to the first container component.
We had an issue CRW-6614 where on the DevSandbox the first component in the devfile was not always the tooling component. I'm still confused as to why this was the case, as no reproducer devfile was left in that Jira and there's only screenshots that show this repo which does have the tooling container as the first component.
Using the attribute controller.devfile.io/merge-contribution: true/false
should allow you to control in which component the editor is contributed.
Thus, it seems natural to also allow for automatic SSH passphrase provisioning with the ssh-agent in all container components.
Ideally, that may be right. But considering that it can generate a workspace startup error, making using Che / Dev Spaces impossible, I would be very cautious. You should find a way to ignore any error in the passphrase provisioning or execute only when it's necessary and providing an option to opt out.
I understand that's not an easy task 🤷 and I don't have any magical solution. However, considering the maturity of Che / Dev Spaces, any new feature that introduces risks of breaking existing workspaces (with no workaround) should be avoided or reverted in my opinion.
According to the kubernetes documentation there is no way to ignore the post start event failure:
If either a PostStart or PreStop hook fails, it kills the Container.
Although we can increase the stability of the workspace start with next solutions:
However: when an SSH key & passphrase is provided via the automount configmap/secret mechanism, it is mounted to all container components (not just the tooling component). Thus, it seems natural to also allow for automatic SSH passphrase provisioning with the ssh-agent in all container components.
So I do not think we should only provide the ssh-agent postStart event to the first container component.
init-ssh-agent
post start event.
Ideally, that may be right. But considering that it can generate a workspace startup error, making using Che / Dev Spaces impossible, I would be very cautious. You should find a way to ignore any error in the passphrase provisioning or execute only when it's necessary and providing an option to opt out.
I understand that's not an easy task 🤷 and I don't have any magical solution. However, considering the maturity of Che / Dev Spaces, any new feature that introduces risks of breaking existing workspaces (with no workaround) should be avoided or reverted in my opinion.
+1 Agreed. Allowing this feature to modify the DevWorkspace unconditionally (and potentially break existing workspaces) was something I should have avoided.
Do not add the post start event if the ssh secret is not defined
- The ssh key secret that maps the keys to the container filesystem, may have another from conventional name, so we can not detect them.
- If the ssh key has a passphrase set up, the workspace start will continue to fail due to a vulnerable container.
Agreed, unfortunately the workaround/fix I currently have provided only resolves the issue for Che/DevSpaces specifically.
Have an additional configuration to opt-out from all the
init-ssh-agent
post start event.
- This overcomplicates the flow and would be not very obvious for users.
Instead of opt-out, I think an opt-in to enable the SSH agent feature (as a devfile & devworkspace attribute) might work. This is something I propose in the long term solution idea of this issue. The Che dashboard would be responsible for setting this attribute. For non-Che users, they would unfortunately have to consult the DWO documentation to discover this attribute, but at least their workspaces would not potentially fail from having this feature enabled by default.
Currently, @ibuziuk and I are thinking of guarding the injection of the SSH agent postStart event under the DWOC's config.enableExperimentalFeatures
field until we can sort out how to handle this feature in the most graceful way.
given that the feature is risky I believe it would be a really good move to enable it based on the DWOC config. on UI we can also articulate that this is experimental - https://github.com/eclipse-che/che-dashboard/pull/1249/files @l0rd @vinokurig @dmytro-ndp thoughts?
Description
In container images where the $HOME directory is not writable (such as the go-toolset image, which has
$HOME=/opt/app-root/src/
) theinit-ssh-agent-command-...
postStart event will fail.This is due to the fact we assume
$HOME/ssh-environment
is writable when doingssh-agent | sed 's/^echo/#echo/' > $SSH_ENV_PATH
, however this is not always the case:Maybe we should wrap the entire ssh-agent intialization command with a
(...) || true
so that regardless of wether a specific step of the ssh-agent initialization fails, the workspace will start up. This is the approach taken for theinit-persistent-home
preStart event.How To Reproduce
oc get devworkspace -n $NAMESPACE
Expected behavior
The workspace should succceed to start up. Whether the automatic SSH passphrase provisioning functionality works is another topic (maybe we should set
SSH_ENV_PATH=/tmp/ssh-environment
instead ofSSH_ENV_PATH=$HOME/ssh-environment
?)Additional context
Upstream Che Issue