cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.91k stars 3.78k forks source link

cdc/bank roachtest pull 260MB off a 3rd party vendor upon every CI run, and fails if upstream unavailable #51543

Open knz opened 4 years ago

knz commented 4 years ago

Describe the problem

The cdc/bank roachtest runs the following command every time it runs:

 curl -s https://packages.confluent.io/archive/4.0/confluent-oss-4.0.0-2.11.tar.gz | tar -xz -C /tmp/confluent

I went and checked and that is a 262MB archive to download (compressed).

The archive is not cached, unlike the builder image, so that's a mandatory ingress cost on every CI run.

Moreover, today the upstream HTTP server is saying "no" and is causing all the CI runs to fails.

Expected behavior

The archive should be embedded in the builder image, and/or the fetch should use a cached copy if it was already downloaded earlier on the TC agent.

(At the very least we should be fetching from a proxy cache inside the CRL infra so that the CI downloads are internal to GCP).

cc @jlinder @tbg for triage.

Epic DEVINF-109

Jira issue: CRDB-4033

knz commented 4 years ago

I have marked the 3 roachtests that use this facility as skipped.

knz commented 4 years ago

@mwang1026 @dt the KV team meeting concluded that since Bulk I/O is owning the CDC product area, the Bulk I/O team is responsible to enhance the testing infrastructure for CDC tests. So we're pushing this to your plate.

Note that the test is currently skipped. That means we disabled test coverage for CDC. That means that addressing this becomes critical path to the next release.

dt commented 4 years ago

Thanks @knz.

@mwang1026 we should potentially re-enable this for now -- while it'd be nice to have it cached, 262MB once a night is a pretty minimal cost (compared to, say, the vms), and while i hate flakes due non-reproducible builds depending on external infra, not testing at all is worse.

jlinder commented 4 years ago

It turns out that cdc/bank is one of the roachtests run on every PR build too.

https://github.com/cockroachdb/cockroach/blob/master/build/teamcity-local-roachtest.sh#L37

knz commented 4 years ago

Yes, in fact on every CI there are three (not one) tests that do this. So the archive gets downloaded and extracted 3 times.

It's not just our network ingress $$ that this impacts; the upstream server probably blocked us because we were incurring outrageous egress $$ on their side.

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/cdc

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/cdc

kenliu-crl commented 1 year ago

reassigning this to CDC team as this has to do with the implementation of the roachtest.

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/cdc