Closed kylos101 closed 2 years ago
Overall we are trying to assess if there is a viable workspace resource size below g1-standard
that could support development of what type of projects. The resources described above seem like a great place to start.
In addition, how would such a resource profile affect pod scheduling, resource utilization and noisy neighbor conditions?
Some extensions are resource intense. @akosyakov , can you think of any extensions that are common, but hungry, that we should include as part of this test?
java, golang, typescript. keep in mind hungry depends on size and complexity of the project, not extensions itself
Tested this with the gitpod-io/website repository. Running tests and making code changes works well but trying to build the website leads to OOM kills or network disconnects due to not being enough memory available.
@mbrevoort good questions!
In addition...
how would such a resource profile affect pod scheduling
The pods would be scheduled to a dedicated node pool for the small workspace class, so, they wouldn't impact pods being scheduled to other node pools (like standard and large).
Initially we'd control density via memory requests, but, may want to consider CPU requests, too. For example, memory requests would be 2Gi, and CPU might be 1 core, or .5 cores, etc.
I do wonder how many small workspaces we could fit on a node? But, first we need to find a size that works for one workspace. And then we can talk about achieving a desired density. For example, if we run more than 18 small workspaces, perhaps 36, we'd have to reduce the related disk IO bandwidth.
resource utilization
The way this is written now, you'd get 1 CPU, and be able to burst to 2 (controlled by ws-daemon), and you wouldn't be able to use more than 4Gi of memory (controlled by Kubernetes).
and noisy neighbor conditions?
@Furisto can you think of anything new that we might have from a risk perspective, by having a higher density of workspaces?
Tested this with the gitpod-io/website repository. Running tests and making code changes works well but trying to build the website leads to OOM kills or network disconnects due to not being enough memory available.
Okay, good to know, @Furisto ! Bummed to hear, though. :wink:
@Furisto can you think of anything new that we might have from a risk perspective, by having a higher density of workspaces?
Higher chance of WorkspaceStuckInStopping alerts because more workspaces mean more backups and we have a concurrency limit for backups. On the other hand the backups should be smaller, so they will go faster.
@Furisto can you share a link to the branch where you made these related changes? It would be good to get a second pair of eyes 👀 on the related configuration for the experiment.
@mbrevoort would you like any further investigation to this resource permutation, or other resource permutations? The small configuration we originally socialized does not look promising.
Tested this with the gitpod-io/website repository. Running tests and making code changes works well but trying to build the website leads to OOM kills or network disconnects due to not being enough memory available.
@Furisto @kylos101 - Would you be able to share more details about which processes are consuming more memory and causing problems when building the website?
👋 @Furisto in hindsight, in talking with @aledbf , let's change the test as follows:
new-workspace-cluster
to @mbrevoort and @jldec , so they can experience the related performance/behavior.@kylos101 @jldec Tested this on an ephemeral cluster and the performance looks better there (presumably due to swap). See https://www.loom.com/share/fc6fb07dd05b4621841d90f5a0f41dc8. I have given you access to the ephemeral cluster @jldec
@jldec let us know what you think? 🙏
Wow, nice, @Furisto ! As a next step, I recommend you:
Depending on output from you both, we'll need to create some issues:
CC: @atduarte for awareness
Initial (single workspace) evaluation with the gitpod-io/website repo did not reveal any issues.
I was able to npm run build
and edit content while watching the dev server live-reload.
Memory and CPU both crept up toward 100% during the build, but I did not observe any errors.
This is the configuration that I am using. I just edited the configmap of ws-manager directly:
"g1-standard": {
"name": "",
"container": {
"requests": {
"cpu": "1m",
"memory": "3328Mi",
"ephemeral-storage": "5Gi"
},
"limits": {
"cpu": {
"min": "1",
"burst": "2"
},
"memory": "4Gi",
"ephemeral-storage": "5Gi",
"storage": "15Gi"
}
},
"templates": {
"defaultPath": "/workspace-templates/g1-standard-default.yaml",
"regularPath": "/workspace-templates/g1-standard-regular.yaml",
"prebuildPath": "/workspace-templates/g1-standard-prebuild.yaml",
"imagebuildPath": "/workspace-templates/g1-standard-imagebuild.yaml"
},
"pvc": {
"size": "15Gi",
"storageClass": "csi-gce-pd-g1-standard",
"snapshotClass": "csi-gce-pd-snapshot-class"
}
},
Here is a video of me using the workspace while the loadtest is running. I am building the project while editing the code and just navigating around. You can see that code completion is not working very well, but otherwise it is fine. Once the build was complete, code completion was ok again.
@jldec @mbrevoort @atduarte Given the above information, would you like to proceed with adding a smaller workspace class?
@Furisto thank you for this bit of info:
You can see that code completion is not working very well, but otherwise it is fine. Once the build was complete, code completion was ok again.
I think that is acceptable given the small workspace class. Appreciate you sharing the result! :+1: Let's wait to get feedback from @jldec @mbrevoort and @atduarte before proceeding.
Thanks for your insight and effort, @Furisto ! 💪
@jldec @atduarte we'll close this for now. Let us know if you'd like a small workspace class to be created? We'd have to schedule and ship changes in ~3 repos, to have a small workspace class as a heads up.
@kylos101 - I did not see the same long delays as in the video when I tested the website repo with the small workspaces myself. Maybe I had a better connection to the environment (I was in Dublin) or maybe the editor caches in my workspace were more warmed up.
I think another test would be helpful to better understand the behavior, but in general I would not hold up the introduction of small workspaces for this reason alone.
Is your feature request related to a problem? Please describe
From a harvester preview VM or workspace-preview, we want to see if a small workspace class (smaller than g1-standard) performs well enough for simple workloads. If it is a good experience, then, we'd want to amend webapp and workspace, to support a new "small" workspace class, prior to enabling UBP and workspace classes for individuals.
Internal context More internal context
cc: @mbrevoort @atduarte
Describe the behaviour you'd like
Idea: alter
ws-manager
andws-daemon
configs, and test a regular workspace with a class using a limit CPU of 1 and burst CPU of 2 (ws-daemon), memory requests of 2Gi and a limit of 4Gi (kubernetes), with 15Gi of storage, and 5Gi of ephemeral storage. Also, please be sure to enable disk IO limiting on the related ws-daemon, this way the workspace is limited similarly to how we would do in production.Then, once you're able to start workspaces using the above config, test that the
small
workspace class configured is works well enough as a regular workspace to develop our website. For example:What about our Gitpod repo? How does it behave from a development standpoint? The assumption is that it will not a great experience.
Describe alternatives you've considered
Some extensions are resource intense. @akosyakov , can you think of any extensions that are common, but hungry, that we should include as part of this test?
Additional context
Will this be useful enough for JetBrains? @akosyakov wdyt? I assume no, because this workspace will not meet minimum requirements.
Definition of done
If feasible for users, a smaller workspace-class recommendation is shared and agreed with Product and Finance teams, and related issues are added to groundwork to support the related deployment.