Open anjor opened 1 year ago
The end goal here is to have a full end to end data onboarding story fleshed out.
Some initial results.
Size (in GB) | Time (in seconds): attempt 1 | Time (in seconds): attempt 2 | Time (in seconds): attempt 3 | Average time |
---|---|---|---|---|
1.8 | 47 | 41 | 41 | 43 |
3.6 | 91 | 86 | 98 | 91.66666667 |
5.4 | 142 | 149 | 130 | 140.3333333 |
7.2 | 173 | 162 | 176 | 170.3333333 |
9 | 215 | 221 | 223 | 219.6666667 |
18 | 439 | 415 | 429 | 427.6666667 |
27 | 655 | 617 | 664 | 645.3333333 |
The above test was carried out using a c3.small.x86 server in the Silicon Valley region of equinix metal. Uploads were tested against shuttle-4 due to proximity of location (shuttle-1 had content adding disabled).
Results for shuttle 7
Size (in GB) | Time (in seconds): attempt 1 | Time (in seconds): attempt 2 | Time (in seconds): attempt 3 | Average time |
---|---|---|---|---|
1.8 | 104 | 115 | 113 | 110.6666667 |
3.6 | 207 | 202 | 241 | 216.6666667 |
5.4 | 314 | 335 | 320 | 323 |
7.2 | 443 | 417 | 494 | 451.3333333 |
9 | 550 | 522 | 480 | 517.3333333 |
18 | 1054 | 1056 | 1103 | 1071 |
27 | 1569 | 1441 | 1955 | 1655 |
The above test was carried out using a c3.small.x86 server in the Dallas region of equinix metal. Uploads were tested against shuttle-7 due to proximity of location.
Proposal: Estuary performance testing
This is a WIP
Proposal/Overview
We should have metrics on estuary's data onboarding performance. We should be able to answer questions such as
The current plan is to set up datasets in increasing sizes ranging from 1GB up to 1TB and measure data onboarding performance.
Technical Design
The performance testing will be carried out on an equinix box. We will download public datasets ranging in sizes from 1GB up to 1TB and try uploading them to estuary.
Known problems
Files larger than 32GB might have issues. Once the endpoint is unable to handle the upload, we will attempt using different preparation tools such as barge and singularity.