cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

roachtest: cache large files downloaded and used in roachtest tests #56856

Open jlinder opened 3 years ago

jlinder commented 3 years ago

Many of our roachtests download large files to use in the tests (some are 100MB+ in size). Add a way to cache these files locally / within cockroach infrastructure to reduce costs and load on the source systems.

One possible way to implement this:

  1. create a google storage bucket
  2. place the files we need there
  3. download the files from the bucket
  4. add some tracking mechanism to check for new versions of the files periodically (assuming they get updated) and automatically download the new file and place it in the bucket

If there are files loaded by the runs in AWS, add this to the above:

  1. create an S3 bucket
  2. place the files in it that are used by the tests run in S3
  3. add a mechanism to roachtest that downloads the target files from the S3 bucket / gs bucket according to whether it's running in GCP or AWS
  4. check for new versions of the files with the above defined tracking mechanism + placing the new versions in the S3 bucket

Epic DEVINF-109

Jira issue: CRDB-2900

jlinder commented 3 years ago

An example of these files: https://github.com/cockroachdb/cockroach/pull/56815

andreimatei commented 3 years ago

In the case of the Jepsen tests, what I think we want is to create an image with all the respective debian packages installed, not just to have a more reliable location to download them from.

tbg commented 3 years ago

Yeah, what we want here is an apt repo proxy. But I'll point out that the clouds already run them for us (if you're on recent distros). It's unlikely that we'll do a better job running them ourselves. So there may not be much to do here other than make sure we're staying on recent distros and verifying what I just said.

andreimatei commented 3 years ago

What I think we need is an image that already has all the packages installed. We shouldn't be downloading anything from any proxy.

stevendanna commented 3 years ago

We've recently discussed this over on the Test Eng team as well.

Speaking just of 3rd party resources that are currently downloaded from the internet, I think that we may want both an HTTP cache and custom images.

It would definitely be good to have artifacts that are used on a high proportion of roachtest runs to be baked into the image directly.

But, there are a number of benefits of also having our own repository of 3rd party resources:

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

stevendanna commented 1 year ago

still relevant