gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
12.78k stars 1.23k forks source link

Cache prebuilds locally (to a region) #6145

Open jankeromnes opened 2 years ago

jankeromnes commented 2 years ago

Caching prebuilds locally in all regions could make Gitpod both faster and cheaper when starting new workspaces from prebuilds.

Current situation

Problem

Transferring data between GCP regions is slow and expensive.

A good example to illustrate this problem is:

Feature request

When a Prebuild is built and stored in region A, and then requested once in region B, it would be great if a "reference copy" (or "cached copy") of that Prebuild also gets stored in region B for future use.

This way, all subsequent requests in region B can use the locally-cached copy (instead of causing constant cross-regional transfers of the same data).

Proposed solution

  1. A workspace is started on a specific repository & commit
  2. Gitpod checks whether a prebuild is available for this repository & commit
  3. (new) If there is a prebuild, but it is from a different region, and it is not cached locally, we still use that prebuild to start the workspace, but we also create a local cache in parallel
  4. (new) The next time a workspace is started on this specific repository & commit & region, the locally-cached prebuild is used instead (faster & cheaper)

Implication on garbage collection:

See also

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jankeromnes commented 2 years ago

Not stale, still very relevant.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jankeromnes commented 2 years ago

Very much not stale and still extremely relevant.

atduarte commented 2 years ago

If I understand correctly, we are now using IPFS to cache the container images per workspace cluster (not exactly per region) which mostly covers what is proposed in this issue. @aledbf could you please confirm my understanding is correct?

aledbf commented 2 years ago

If I understand correctly, we are now using IPFS to cache the container images per workspace cluster (not exactly per region) which mostly covers what is proposed in this issue.

That's correct.

That said, prebuilds are stored in GCS, not a container image.

jankeromnes commented 2 years ago

FYI, I've edited the top comment to link to an internal Slack discussion about trade-offs between caching ourselves in all GCP regions and using Cloudflare R2 (no egress costs).

aledbf commented 2 years ago

Cloudflare R2 (no egress costs).

That is not GA (yet) and it seems only to support Cloudflare workers