Closed l0rd closed 2 years ago
This also would help in solving issues with e.g. Gluster being too slow for some operations (npm install
).
Another option mentioned by @gorkem is to leverage ephemeral containers introduced in 1.16. That would allow to avoid using rsync.
Hello,
here are some notes:
about data sync pod
:
per user
pod so the projects data are kept on the user namespace./projects
folder and it could be client of the data sync pod as well.optimization
One the goal is to be able to start the workspace as fast as possible. For that it means that we create a new workspace (no previous state):
data-sync-pod
service should reply as soon as possible that there are no existing data for this given workspace. (even before trying to mount the PVC). By mounting PVC ‘asynchronously’, it means that the workspace can boot quickly, projects can be cloned and persistence of the workspace will be done later, when the PVC is up. (IDE will notify like with a status bar that it starts with ephemeral storage and then backup/persistency is there)If there is previous data, IDE needs to wait to have project restored before displaying full layout.
Storage synchronization
:
optimization: it could cleanup 'unpacked' folder and only keep zip files if files were not used since a lot of time.
theia enhancements
:
Another optimization: For now, import/clone of source code is performed when we're entering into the IDE. (it's useful if some 'private' repository is accessed as we may need the github token and have oAuth, etc) but in the case of a public repository, if the project is cloned as soon as possible, it means that we could enter into Theia by having already the project cloned previously or in parallel. So it might speed up again the process. --> needs another Epic just for this specific item.
Just a couple of notes:
Warning: Ephemeral containers are in early alpha state and are not suitable for production clusters.
yarn
, for example).about 1. we read the docs as well but thx :-)
about 1. we read the docs as well but thx :-)
No doubt, but the fact that we're relying on unproven technology might merit a bit of discussion, no? Is this tech that is ready for our customers?
@tsmaeder we're not considering for now as it should work on all openshift/kubernetes instances
Do we really need to have per-workspace PV attachment?
Can not we have single Data Sync service/deployment used by all the workspaces instead?
@gazarenkov it's per user namespace (all workspaces of a user) first.
@benoitf Ok, thanks, then do we really need it per-namespace? :)
@gazarenkov at first because for example on che.openshift.io you won't be able to mount like a "super big" PV to store all workspaces data (and then how do you manage quota per user as today) and cross-clusters stuff, etc. By using per-namespace at first (but still thinking to allow one service for all users/etc) it remains in the same K8S architecture.
@benoitf What are the limitations for mounting a single big PV in che.openshift.io?
@benoitf What are the limitations for mounting a single big PV in che.openshift.io?
@gazarenkov that's trickier because the service that does the sync needs to deal with files of different users. That can be implemented as a second iteration though. But let's keep this first iteration simple and implement a PV per user.
@l0rd Looks like we are on the same page regarding direction. If so, I'd suggest reconsider the strategy estimating going to single service at once, because:
So, I'd definitely suggest considering single data service as an option to consider before we go to implementation.
Just some thoughts about this issue, the ongoing work on the Workspace CRD, and cloud shell.
According to this EPIC https://github.com/eclipse/che/issues/15425, there will be, at some point, the ability to start Che 7 workspaces in a lightweight, standalone, and embeddable way, without requiring the presence of the Che master (already demoed as a POC).
One important point mentioned in this EPIC, is the big scalability gain that would be brought, in this envisioned K8S-native architecture, by:
In the light of this, I would prefer starting this work with the option that is, as much as possible, compatible with both use-cases:
So it seems to plead for a per-user-namespace solution first. Of course this should not prevent us to extend this solution to use a central service in a second step. But requiring an additional central server to be able to start workspaces seem contrary with the architectural direction we've taken with the DevWorspace CRD and the cloud-shell.
@davidfestal Could you please elaborate about your vision of a layer which persists projects code between user sessions (i.e. temporary) in a light of workspace management decentralization. I.e if in our next system we replace Che server with CRD/controller and Postrge with etcd what does that mean for physical storage for projects? How exactly it related? We are going to replace single (distributed) filesystem (based on Gluster/Seph/EBS/something else) with what?
@gazarenkov
Physical storage for workspace data is already per-user (if not per-workspace), through namespaced PVs, and not centralized and common to all the users. I don't see what should change here with the Workspace CRD architecure. Workspace data physical storage is already decentralized. I don't see why it would be required to change the existing way, and now store workspace data in a PV common to all users.
But even without going into all technical details here, my point was to say that requiring an additional centralized service in an architecture that finally should be compatible with workspace management decentralization, seems strange to me.
Afaict, the initial proposal from @benoitf with per-user-namespace storage, would fit the existing and future structure of the Workspace CRD POC.
But sure, a centralized workspace storage service could, at some point, be an optimization option for some use-cases.
Wouldn't a single big PV require ReadWriteMany access mode?
@gorkem I would guess RWO will work fine for single Data Store Pod, if second (and more) pods spin up - it depends whether scheduler put it on the same node (should work) or different (will not). https://github.com/kubernetes/kubernetes/issues/26567
@gazarenkov why do you think one central service is simpler? In a centralized service we have to build a secured-to-the-bone mechanism that matches users with folders. And we need to consider scalability as well. A problem with that service and users won't be able to access their data or even worst will have access to data of other users. I don't want to deal with those problems right now.
For the reuse of existing code that's an implementation detail. I would let the team that will work on the code to decide.
@l0rd my guess that it may happen that single service may be simpler based on the fact that we have an experience and working system which used this approach. The only potential problem if we run into PV/K8s infra specific limitations which will not allow us to use it (such as PV size, access mode etc).
I do not think user should have direct access to this data (which is a hot backup of projects), only via Data Sync service which supposedly can scale Pod the same way as usual K8s Deployment ?
I think it may even work w/o this service exactly the same as it does with Ephemeral storage now, i.e. user have access to the instance storage only, syncing this data is exclusively internal mechanism. That's why I do not think this storage should even know who is the owner of particular workspace, it may just deal with folders identified with workspaceId.
Additional bonus of this approach may be a zero PV attaching/mounting time (like ephemeral again).
So, to me, it looks as an option to consider before coding, no?
About the Central Service
Some other considerations
emptyDir
is not infinite and it is shared with an unknown number of services with varying nature that can be scheduled to the same node. So we can say that this solution is going to have an un-deterministic character and will require better ever handling. @ibuziuk was there something left here or we can close the epic?
Issues go stale after 180
days of inactivity. lifecycle/stale
issues rot after an additional 7
days of inactivity and eventually close.
Mark the issue as fresh with /remove-lifecycle stale
in a new comment.
If this issue is safe to close now please do so.
Moderators: Add lifecycle/frozen
label to avoid stale mode.
Is your enhancement related to a problem?
No matter how fast we get to bootstrap a Che workspace, no matter how many external resources we are able to pre-pull (images, extensions, examples source code), we will always need to wait 20+s for a PV to be attached and mounted on the workspace pod.
Describe the solution you'd like
New Workspace lifecycle:
Workspace components in Read-only mode
In the "Startup data sync phase" the user will already be able to use the editor and plugins but those should behave in a read-only mode until all the data has been synced to the ephemeral volume. That means that Che editors (for example theia) should be able to work on read only mode (initially this can be done by showing a progress bar that shows the data sync and not allowing the user to access theia).
rsync protocol
Rsync is mentioned as the remotes files synchronisation protocol but that’s just an example. If there is a better alternative, let's use it.
Ideas to improve performances (even more)
Florent's edit:
Tasks