etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.38k stars 9.72k forks source link

[Robustness tests] Reduce runtime #15086

Open serathius opened 1 year ago

serathius commented 1 year ago

What would you like to be added?

There was a recent runtime increase for linearizability tests. image

Execution time almost doubled, for example for presubmit it went from ~18m to 30m and for nightly it went from 2h to 3h.

My main expectation is because of growing complexity of etcd model. Based on my experience running the test, recent improvements have reduced variance of run time, however it increased overall time.

For now I totally ignored performance for simplicity (like use json to serialize model), however we should start looking into low hanging fruit and fix it.

My recommendation would be to start profiling to identify places for improvement.

Why is this needed?

Increased run time started causing failures in run. 30 minutes for presubmit is also too long.

serathius commented 1 year ago

cc @tjungblu You were interested in etcd model performance.

fuweid commented 1 year ago

I would like to help. @serathius would you mind to assign it to me? Thanks

serathius commented 1 year ago

Sure, thanks for offering. I did some simple investigation by collecting pprof from github action in https://github.com/serathius/etcd/actions/runs/3901079392 (available in artifacts file)

Maybe it would be useful

serathius commented 1 year ago

image

It showed that json took ~27% of cpu time. However one discrepancy is that it recorded only 40s from 15m github action. Either I didn't collect profile correctly or there are other things that should be optimized.

tjungblu commented 1 year ago

@fuweid feel free to pick this up, I've spent some time last year to move porcupine to generics in: https://github.com/tjungblu/porcupine

That way you can skip the json serde and without the casting it's a bit quicker in general. If the checking itself is taking longer due to an exponential search space increase, it's not going to help much however ;)

serathius commented 1 year ago

If the checking itself is taking longer due to an exponential search space increase, it's not going to help much however ;)

Overall you are right, however with https://github.com/etcd-io/etcd/pull/15078 chances that space grows exponentially were greatly reduced.

serathius commented 1 year ago

Interesting I found one seemingly unrelated culprit. With network proxy enabled 3 member clusters takes 8 seconds to shutdown.

serathius commented 1 year ago

This should already help https://github.com/etcd-io/etcd/pull/15091

fuweid commented 1 year ago

Still investigating the cause.

And the github action page shows total duration including pending time, which is not correct. Wrote a tool to dump the job run duration. Parse the result here.

Click to check! ``` go run main.go --owner etcd-io --repo etcd run list --workflow-id 36262010 --created 2022-12-01..2023-01-30 --jobs "test / test" --jobs "test" --event push --status success ``` ``` ID NAME EVENT HEAD(MESSAGE) STATUS JOB(test / test) JOB(test) CREATED 4044118009 Linearizability push Merge pull request #15191 from serathius/linearizability-failpoints-timeout success 13m42s N/A Mon Jan 30 13:05:34 UTC 2023 4041995481 Linearizability push tools: fix the CSV format error and add build target 'tools' in Makefile success 14m26s N/A Mon Jan 30 08:35:48 UTC 2023 4041504776 Linearizability push fix(client):getToken automatically when the token has expired success 13m32s N/A Mon Jan 30 07:20:05 UTC 2023 4040377539 Linearizability push CHANGELOG: add item about improvement for concurrent read success 14m58s N/A Mon Jan 30 03:43:59 UTC 2023 4033814019 Linearizability push test: reorder the import items in linearizability/model/model.go success 13m12s N/A Sat Jan 28 22:12:07 UTC 2023 4020868209 Linearizability push etcdserver: correct the old name of notifyc in comments success 14m9s N/A Fri Jan 27 02:02:28 UTC 2023 4000937645 Linearizability push Merge pull request #15180 from serathius/linearizability-large-put success 14m14s N/A Tue Jan 24 22:23:03 UTC 2023 4000537796 Linearizability push Merge pull request #15178 from serathius/linearizability-defrag success 13m30s N/A Tue Jan 24 21:23:58 UTC 2023 3999848587 Linearizability push Change verb and use switch stmt success 12m52s N/A Tue Jan 24 19:47:23 UTC 2023 3999604131 Linearizability push Merge pull request #15179 from serathius/linearizability-deduplicate-scenarios success 13m9s N/A Tue Jan 24 19:11:59 UTC 2023 3999310141 Linearizability push CHANGELOG: add item about improvement for limit range success 13m12s N/A Tue Jan 24 18:33:12 UTC 2023 3999128047 Linearizability push Merge pull request #15172 from dims/add-functional-tests-on-arm64 success 13m40s N/A Tue Jan 24 18:09:43 UTC 2023 3997926134 Linearizability push Merge pull request #15176 from serathius/linearizability-cluster-stop success 12m59s N/A Tue Jan 24 15:48:10 UTC 2023 3996330157 Linearizability push Merge pull request #15159 from serathius/linearizability-separate-request-type success 13m5s N/A Tue Jan 24 12:48:02 UTC 2023 3994529219 Linearizability push dependency: bump github.com/google/addlicense from 1.1.0 to 1.1.1 success 12m40s N/A Tue Jan 24 08:59:51 UTC 2023 3991720260 Linearizability push build(deps): bump golangci/golangci-lint-action from 3.3.1 to 3.4.0 success 13m15s N/A Tue Jan 24 00:10:47 UTC 2023 3988840683 Linearizability push build(deps): bump golangci/golangci-lint-action from 3.3.1 to 3.4.0 success 14m16s N/A Mon Jan 23 17:16:20 UTC 2023 3988771639 Linearizability push build(deps): bump github.com/coreos/go-semver from 0.3.0 to 0.3.1 success 13m16s N/A Mon Jan 23 17:08:24 UTC 2023 3988744924 Linearizability push build(deps): bump github.com/google/addlicense in /tools/mod success 13m37s N/A Mon Jan 23 17:05:20 UTC 2023 3988564469 Linearizability push Merge pull request #15158 from sashamelentyev/cleanup-2 success 13m26s N/A Mon Jan 23 16:44:24 UTC 2023 3985429387 Linearizability push changelog: Set release date for v3.5.7 success 13m35s N/A Mon Jan 23 10:22:01 UTC 2023 3966850271 Linearizability push build(deps): bump github/codeql-action from 2.1.38 to 2.1.39 success 13m12s N/A Fri Jan 20 10:36:00 UTC 2023 3966357790 Linearizability push Merge pull request #15149 from serathius/linearizability-artifact-name success 13m46s N/A Fri Jan 20 09:27:22 UTC 2023 3966076803 Linearizability push build(deps): bump github/codeql-action from 2.1.38 to 2.1.39 success 13m15s N/A Fri Jan 20 08:47:42 UTC 2023 3966073521 Linearizability push Merge pull request #15142 from serathius/linearizability-nightly-fix2 success 13m31s N/A Fri Jan 20 08:47:11 UTC 2023 3956826236 Linearizability push Merge pull request #15146 from Niravprajapati1/typo-fix success 10m36s N/A Thu Jan 19 08:55:49 UTC 2023 3952945944 Linearizability push Merge pull request #15130 from guozhao-coder/replace-timeAfter success 10m26s N/A Wed Jan 18 21:09:25 UTC 2023 3948927616 Linearizability push Merge pull request #14682 from serathius/issue13766 success 13m50s N/A Wed Jan 18 12:37:52 UTC 2023 3945147091 Linearizability push Merge pull request #15135 from fuweid/update-3.4-changelog success 25m10s N/A Wed Jan 18 02:30:17 UTC 2023 3937942188 Linearizability push Merge pull request #15132 from serathius/linearizability-nightly-fix success 25m24s N/A Tue Jan 17 09:40:59 UTC 2023 3937801834 Linearizability push Merge pull request #15098 from ahrtr/changelog_promote_learner_20230113 success 26m3s N/A Tue Jan 17 09:23:30 UTC 2023 3937305327 Linearizability push Merge pull request #15128 from serathius/linearizability-move-model success 25m38s N/A Tue Jan 17 08:17:35 UTC 2023 3934624945 Linearizability push Merge pull request #15129 from ahrtr/bump_dependency_20230117 success 25m46s N/A Mon Jan 16 23:16:51 UTC 2023 3934600548 Linearizability push Merge pull request #15095 from ahrtr/promote_non_exist_id_20230113 success 25m27s N/A Mon Jan 16 23:12:33 UTC 2023 3934506453 Linearizability push build(deps): bump github/codeql-action from 2.1.37 to 2.1.38 success 25m21s N/A Mon Jan 16 22:55:16 UTC 2023 3932564589 Linearizability push build(deps): bump github/codeql-action from 2.1.37 to 2.1.38 success 25m23s N/A Mon Jan 16 17:11:44 UTC 2023 3932528523 Linearizability push build(deps): bump google.golang.org/grpc from 1.51.0 to 1.52.0 success 25m34s N/A Mon Jan 16 17:07:13 UTC 2023 3932526024 Linearizability push build(deps): bump github.com/dustin/go-humanize from 1.0.0 to 1.0.1 success 24m50s N/A Mon Jan 16 17:06:52 UTC 2023 3932509655 Linearizability push build(deps): bump gotest.tools/gotestsum in /tools/mod success 24m56s N/A Mon Jan 16 17:05:00 UTC 2023 3932508989 Linearizability push build(deps): bump github.com/mikefarah/yq/v4 in /tools/mod success 24m42s N/A Mon Jan 16 17:04:53 UTC 2023 3929603335 Linearizability push Merge pull request #15106 from serathius/linearizability-nightly-release success 25m43s N/A Mon Jan 16 10:47:43 UTC 2023 3925799760 Linearizability push Merge pull request #15071 from wafuwafu13/refactor-gettoken success N/A 26m14s Sun Jan 15 23:26:37 UTC 2023 3920743133 Linearizability push Merge pull request #15105 from serathius/linearizability-nightly-timeout success N/A 26m10s Sat Jan 14 23:21:46 UTC 2023 3917665954 Linearizability push Merge pull request #15101 from serathius/linearizability-history-patch-window success N/A 25m54s Sat Jan 14 08:20:30 UTC 2023 3909555699 Linearizability push Merge pull request #15093 from geetasg/lease_support_review_feedback success N/A 25m28s Fri Jan 13 08:16:10 UTC 2023 3908224610 Linearizability push Merge pull request #15094 from fuweid/changelog-update success N/A 25m34s Fri Jan 13 03:49:17 UTC 2023 3906662264 Linearizability push Merge pull request #15091 from serathius/linearizability-stop-proxy success N/A 26m16s Thu Jan 12 22:47:12 UTC 2023 3902399088 Linearizability push Merge pull request #15084 from serathius/linearizability-lease-timeout success N/A 28m28s Thu Jan 12 12:50:03 UTC 2023 3900752690 Linearizability push Merge pull request #15078 from serathius/linearizability-patch-operations success N/A 29m22s Thu Jan 12 09:09:06 UTC 2023 3893794827 Linearizability push Merge pull request #15080 from geetasg/lease_support_refactor success N/A 29m59s Wed Jan 11 14:34:56 UTC 2023 3891730784 Linearizability push Merge pull request #15068 from ahrtr/last_wal_rec_corrupt_main_20220107 success N/A 27m49s Wed Jan 11 10:07:04 UTC 2023 3891062783 Linearizability push Merge pull request #15070 from ahrtr/changelog_wal_record_partially_synced_20230 success N/A 28m0s Wed Jan 11 08:38:07 UTC 2023 3885506384 Linearizability push Merge pull request #15044 from serathius/linearizability-watch success N/A 29m37s Tue Jan 10 16:59:10 UTC 2023 3881435488 Linearizability push Merge pull request #15046 from serathius/linearizability-failpoints-available success N/A 22m1s Tue Jan 10 08:15:37 UTC 2023 3879547454 Linearizability push Merge pull request #15073 from etcd-io/dependabot/go_modules/server/golang.org/x success N/A 22m39s Tue Jan 10 02:14:50 UTC 2023 3879238534 Linearizability push tidy up dependencies success N/A 23m7s Tue Jan 10 01:13:09 UTC 2023 3879210053 Linearizability push Merge pull request #15074 from etcd-io/dependabot/github_actions/actions/upload- success N/A 24m13s Tue Jan 10 01:08:05 UTC 2023 3879193383 Linearizability push Merge pull request #15075 from etcd-io/dependabot/github_actions/actions/checkou success N/A 21m54s Tue Jan 10 01:05:09 UTC 2023 3876075434 Linearizability push build(deps): bump actions/checkout from 3.2.0 to 3.3.0 success N/A 23m8s Mon Jan 9 17:12:55 UTC 2023 3876074203 Linearizability push build(deps): bump actions/upload-artifact from 3.1.1 to 3.1.2 success N/A 23m42s Mon Jan 9 17:12:48 UTC 2023 3874582102 Linearizability push Merge pull request #15059 from serathius/linearizability-operations success N/A 22m9s Mon Jan 9 14:12:27 UTC 2023 3874420966 Linearizability push Merge pull request #15048 from serathius/linearizability-network success N/A 22m25s Mon Jan 9 13:52:01 UTC 2023 3861570770 Linearizability push Merge pull request #15067 from halegreen/fix-comment-typo success N/A 18m25s Sat Jan 7 10:02:46 UTC 2023 3843945339 Linearizability push Merge pull request #15033 from chaochn47/snapshotCatchupEntries success N/A 17m51s Thu Jan 5 04:47:43 UTC 2023 3805686335 Linearizability push Merge pull request #15055 from ptabor/20221228-fix-dump-logs-tests success N/A 17m37s Fri Dec 30 08:17:41 UTC 2022 3804333929 Linearizability push Merge pull request #15056 from ptabor/20221218-goimport-tests success N/A 17m22s Fri Dec 30 02:18:16 UTC 2022 3803281539 Linearizability push Merge pull request #14924 from serathius/linearizability-multiple-keys success N/A 17m1s Thu Dec 29 21:48:03 UTC 2022 3799935126 Linearizability push Merge pull request #15052 from ptabor/20221228-goimports-fix success N/A 16m54s Thu Dec 29 10:31:38 UTC 2022 3799380964 Linearizability push Merge pull request #15053 from wafuwafu13/show-as-learner success N/A 16m44s Thu Dec 29 08:44:30 UTC 2022 3798903170 Linearizability push Merge pull request #15054 from fuweid/cleanup-old-build-tags success N/A 16m14s Thu Dec 29 07:00:33 UTC 2022 3796026288 Linearizability push Merge pull request #15045 from serathius/linearizability-model success N/A 16m56s Wed Dec 28 19:01:11 UTC 2022 3794229591 Linearizability push Merge pull request #15051 from JOJO0527/patch-2 success N/A 16m38s Wed Dec 28 13:14:26 UTC 2022 3792656075 Linearizability push Merge pull request #15050 from ahrtr/fix_rate_limit_20221228 success N/A 16m40s Wed Dec 28 08:15:13 UTC 2022 3783950650 Linearizability push Merge pull request #15047 from etcd-io/dependabot/github_actions/ossf/scorecard- success N/A 30m10s Mon Dec 26 23:12:28 UTC 2022 3782563023 Linearizability push build(deps): bump ossf/scorecard-action from 2.1.0 to 2.1.2 success N/A 17m17s Mon Dec 26 17:18:55 UTC 2022 3778557346 Linearizability push Merge pull request #15021 from ramil600/em_manager success N/A 17m38s Mon Dec 26 02:19:31 UTC 2022 3766287251 Linearizability push Merge pull request #15040 from ahrtr/update_nsswitch success N/A 14m2s Fri Dec 23 14:21:08 UTC 2022 3765546303 Linearizability push Merge pull request #15039 from ahrtr/changelog_remove_busybox_20221223 success N/A 13m9s Fri Dec 23 12:13:07 UTC 2022 3764861752 Linearizability push Merge pull request #15036 from ahrtr/remove_busybox_20221223 success N/A 14m41s Fri Dec 23 10:09:44 UTC 2022 3747025198 Linearizability push changelog: update the release date of etcd v3.4.23 success N/A 27m22s Wed Dec 21 06:37:06 UTC 2022 3744748489 Linearizability push Merge pull request #15031 from ahrtr/SnapshotWithVersion_nil_panic_20221220 success N/A 14m8s Tue Dec 20 22:45:35 UTC 2022 3744736866 Linearizability push Merge pull request #15024 from wafuwafu13/client-isunavailableerr-test success N/A 14m57s Tue Dec 20 22:43:01 UTC 2022 3737620086 Linearizability push Merge pull request #15023 from unai-ttxu/task/fix_etcdctl_member_list_hex_consis success N/A 14m5s Tue Dec 20 04:52:53 UTC 2022 3736918148 Linearizability push Merge pull request #15012 from wafuwafu13/credentials-test success N/A 13m34s Tue Dec 20 02:25:22 UTC 2022 3736826432 Linearizability push Merge pull request #15026 from etcd-io/dependabot/github_actions/ossf/scorecard- success N/A 12m57s Tue Dec 20 02:08:10 UTC 2022 3736644532 Linearizability push build(deps): bump ossf/scorecard-action from 2.0.6 to 2.1.0 success N/A 13m40s Tue Dec 20 01:29:12 UTC 2022 3736633398 Linearizability push Merge pull request #15027 from etcd-io/dependabot/github_actions/github/codeql-a success N/A 14m3s Tue Dec 20 01:27:00 UTC 2022 3736627663 Linearizability push Merge pull request #15025 from etcd-io/dependabot/go_modules/tools/mod/github.co success N/A 14m24s Tue Dec 20 01:25:44 UTC 2022 3733587372 Linearizability push build(deps): bump github/codeql-action from 2.1.36 to 2.1.37 success N/A 13m40s Mon Dec 19 17:09:23 UTC 2022 3733587190 Linearizability push build(deps): bump ossf/scorecard-action from 2.0.6 to 2.1.0 success N/A 13m32s Mon Dec 19 17:09:22 UTC 2022 3733540402 Linearizability push build(deps): bump github.com/mikefarah/yq/v4 in /tools/mod success N/A 13m27s Mon Dec 19 17:04:08 UTC 2022 3731324339 Linearizability push Merge pull request #15020 from ahrtr/changelog_security_20221219 success N/A 13m46s Mon Dec 19 12:18:04 UTC 2022 3730138842 Linearizability push Merge pull request #14993 from ramil600/add-log success N/A 13m11s Mon Dec 19 09:30:54 UTC 2022 3729704702 Linearizability push Merge pull request #14875 from BarrensZeppelin/patch-lessor success N/A 14m1s Mon Dec 19 08:28:54 UTC 2022 3722089345 Linearizability push Merge pull request #14918 from fuweid/revert-pr-12129 success N/A 14m10s Sat Dec 17 22:38:05 UTC 2022 3718841974 Linearizability push Merge pull request #14998 from chaochn47/auth_test_split success N/A 14m44s Sat Dec 17 06:23:50 UTC 2022 3710926167 Linearizability push Merge pull request #14999 from fuweid/deflake-14926 success N/A 13m52s Fri Dec 16 06:42:37 UTC 2022 3710629284 Linearizability push Merge pull request #15005 from ArkaSaha30/trivy-nightly-scan success N/A 13m57s Fri Dec 16 05:41:50 UTC 2022 3710107846 Linearizability push Merge pull request #15004 from j13tw/fix-doc success N/A 13m50s Fri Dec 16 03:48:15 UTC 2022 3709637997 Linearizability push Merge pull request #15000 from wafuwafu13/server-access-control-test success N/A 14m15s Fri Dec 16 02:10:07 UTC 2022 ```
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

chaochn47 commented 1 year ago

@serathius Is the issue still valid after #15078 #15242?

@fuweid feel free to pick this up, I've spent some time last year to move porcupine to generics in: https://github.com/tjungblu/porcupine

That way you can skip the json serde and without the casting it's a bit quicker in general.

Moving porcupine to generics seems another area to reduce the runtime.

serathius commented 1 year ago

Issue is still valid, however I think we need to change the approach. Profiling done by me looks incorrect. Don't think rewrite to generics will help. We can make the code faster, but I don't think it will reduce the test runtime. I think we lack proper instrumentation of tests to measure runtime and the contributing factors.

I'm thinking about collecting reports about how much each tests took. Similar to what we do for flakiness measurement. The test should output junit report, that we can later aggregate over hundreds of runs, and calculate some statistics instead of eyeballing the useless Github action page.

chaochn47 commented 1 year ago

Thanks @serathius! I think it's a good idea to first narrow down if a single test case takes the majority of the runtime or it's a universe pattern in robustness test. After that, we might use profiling to identify which code path is unexpectedly slow. Created an issue with background to track this ^

tjungblu commented 1 year ago

Don't think rewrite to generics will help. We can make the code faster, but I don't think it will reduce the test runtime.

It won't help that much, IIRC the difference was about 20% at best. There was a trade-off between copying the state at every step (json serde) and doing a plain memcopy of a struct (plus the casting overheads).

Especially if the culprit is slow GH action CPUs. +1 on the instrumentation though. I'm still a big fan of the pebble benchmark page: https://cockroachdb.github.io/pebble/?max=local

Maybe we can come up with something similar, obviously for test suite runtimes first. Even though I expect significant noise/variance from the action runners.

stale[bot] commented 12 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.