Closed joseandro closed 2 years ago
This is the overall scenario:
We have a long running process that is taking longer than expected. Sometimes it takes 5hs, other 7hs and recently I found an execution with more than 9 hours. However eventually it works faster with less than 40 minutes.
image.png
image.png
So it looks like 2 things are happening.
1: BK is trying to send the test executive a signal, IDK why or what the signal is. 2: the job is ignoring the signal
[90m# Received cancellation signal, interrupting[0m Terminated it looks like the integration test framework does process the "cancellation signal" but then BK keeps sending it signals
In the logs I see the code is sending an exit signal in a loop. BK should have caught this and terminated the job. The exit code is 20563. Are you familiar with this?
[0;36m[2022-7-14 17:14:53.750926]DuneexeTest_executive: initializing log engine [0m
[0;35m[2022-7-14 17:14:58.649633]Integration_test_cloud_engine__Stack_driver_log_engine: Event subscription created [0m
[0;36m[2022-7-14 17:14:58.649669]DuneexeTest_executive: beginning to process network events [0m
[0;36m[2022-7-14 17:14:58.649794]DuneexeTest_executive: initializing dsl [0m
[0;36m[2022-7-14 17:14:58.649900]DuneexeTest_executive: initializing network abstraction [0m
[0;35m[2022-7-14 17:14:58.649940]Integration_test_cloud_engine__Kubernetes_network: Waiting for pods to be assigned nodes and become ready [0m
!!! === COMMAND ===
!!! gcloud "pubsub" "subscriptions" "pull" "projects/o1labs-192920/subscriptions/it-auto-b9498af8-zkapps" "--auto-ack" "--limit" "5" "--format" "table(DATA)"
!!! === STDOUT ===
!!!
!!! === STDERR ===
!!! ERROR: (gcloud.pubsub.subscriptions.pull) NOT_FOUND: Resource not found (resource=projects/o1labs-192920/subscriptions/it-auto-b9498af8-zkapps).
!!!
[0;31m[2022-7-14 17:15:11.374655]DuneexeTest_executive: (monitor.ml.Error"command exited with status code 1"("Raised at BaseError.raise in file \"src/error.ml\" (inlined), line 8, characters 14-30""Called from BaseOr_error.ok_exn in file \"src/or_error.ml\", line 84, characters 17-32""Called from Async_kernel__Deferred1.M.map.(fun) in file \"src/deferred1.ml\", line 17, characters 40-45""Called from Async_kernelJob_queue.run_job in file \"src/job_queue.ml\" (inlined), line 128, characters 2-5""Called from Async_kernelJob_queue.run_jobs in file \"src/job_queue.ml\", line 169, characters 6-47""Caught by monitor at file \"src/app/test_executive/test_executive.ml\", line 306, characters 27-27")) [0m
[0;35m[2022-7-14 17:15:11.374729]DuneexeTest_executive: cleaning up testnet (reason: "exception thrown") [0m
[0;35m[2022-7-14 17:15:29.501451]Integration_test_cloud_engine__Kubernetes_network: Pods assigned to nodes [0m
[0;35m[2022-7-14 17:15:29.501496]DuneexeTest_executive: Starting the daemons within the pods [0m
[0;35m[2022-7-14 17:15:29.501505]DuneexeTest_executive: starting seed-dbfd45497-k8g9b ... [0m
[0;35m[2022-7-14 17:15:31.108661]DuneexeTest_executive: seed-dbfd45497-k8g9b started [0m
[0;35m[2022-7-14 17:15:31.108728]Integration_test_lib__Dsl: Waiting for [seed-dbfd45497-k8g9b] to initialize (soft_timeout: "10m", hard_timeout: "15m") [0m
[90m# Received cancellation signal, interrupting [0m Terminated
++ kill -- 20563
+++ kill -- 20563
++++ kill -- 20563
+++++ kill -- 20563
++++++ kill -- 20563
--- evidence ---
Tasks:
Fast lint steps; binable compatibility changes,feature/compatible-hashing-3,0:01:16.219000 :pipeline:,feature/compatible-hashing-3,0:00:57.172000 Prepare monorepo triage,feature/compatible-hashing-3,0:00:46.026000 Monorepo triage,feature/compatible-hashing-3,0:00:42.072000 Zkapps test transaction tool unit tests,feature/compatible-hashing-3,0:18:05.723000 Archive node unit tests,feature/compatible-hashing-3,0:08:07.797000 Connect to testnet,feature/compatible-hashing-3,0:20:30.296000 SnarkyJS unit tests,feature/compatible-hashing-3,0:20:26.940000 Install Yarn dependencies,feature/compatible-hashing-3,0:00:25.423000 Build client SDK and run unit tests,feature/compatible-hashing-3,0:12:51.376000 Prepublish client SDK packages,feature/compatible-hashing-3,0:00:27.549000 Check GraphQL Schema,feature/compatible-hashing-3,0:19:40.372000 Fuzzy zkapp unit tests,feature/compatible-hashing-3,0:25:31.043000 Compare test signatures,feature/compatible-hashing-3,0:09:04.875000 Shared state integration test (migrated),feature/compatible-hashing-3,0:16:17.151000 dev unit-tests,feature/compatible-hashing-3,0:41:00.824000 Build test-executive,feature/compatible-hashing-3,0:06:05.741000 Build JS integration tests,feature/compatible-hashing-3,0:10:44.781000 peers-reliability integration test,feature/compatible-hashing-3,0:35:31.549000 chain-reliability integration test,feature/compatible-hashing-3,0:35:13.596000 payment integration test,feature/compatible-hashing-3,0:28:01.936000 delegation integration test,feature/compatible-hashing-3,0:26:19.190000 gossip-consis integration test,feature/compatible-hashing-3,0:28:00.502000 opt-block-prod integration test,feature/compatible-hashing-3,0:14:19.289000 medium-bootstrap integration test,feature/compatible-hashing-3,0:41:32.798000 zkapps integration test,feature/compatible-hashing-3,0:34:26.665000 zkapps-timing integration test,feature/compatible-hashing-3,0:39:50.259000 snarkyjs integration test,feature/compatible-hashing-3,0:29:54.502000 Replayer test,feature/compatible-hashing-3,0:22:22.299000 Publish client SDK to npm,feature/compatible-hashing-3,0:00:27.256000 Build Libp2p helper for Bionic,feature/compatible-hashing-3,0:00:50.406000 Build Mina for Bionic,feature/compatible-hashing-3,0:11:45.697000 Docker: daemon-devnet-bionic-docker-image,feature/compatible-hashing-3,0:15:45.225000 Docker: daemon-berkeley-bionic-docker-image,feature/compatible-hashing-3,0:15:51.028000 Docker: daemon-mainnet-bionic-docker-image,feature/compatible-hashing-3,0:16:12.257000 Docker: archive-bionic-docker-image,feature/compatible-hashing-3,0:19:47.718000 Docker: rosetta-bionic-docker-image,feature/compatible-hashing-3,0:33:52.974000 Docker: zkapp-test-transaction-bionic-docker-image,feature/compatible-hashing-3,0:14:14.169000 Build Libp2p helper for Stretch,feature/compatible-hashing-3,0:02:45.532000 Build Mina for Stretch,feature/compatible-hashing-3,0:14:00.291000 Docker: daemon-devnet-stretch-docker-image,feature/compatible-hashing-3,0:18:13.843000 Docker: daemon-berkeley-stretch-docker-image,feature/compatible-hashing-3,0:17:49.196000 Docker: daemon-mainnet-stretch-docker-image,feature/compatible-hashing-3,0:20:34.604000 Docker: archive-stretch-docker-image,feature/compatible-hashing-3,0:04:11.142000 Docker: rosetta-stretch-docker-image,feature/compatible-hashing-3,0:41:40.020000 Docker: zkapp-test-transaction-stretch-docker-image,feature/compatible-hashing-3,0:18:34.576000 Build Libp2p helper for Focal,feature/compatible-hashing-3,0:02:52.265000 Build Mina for Focal,feature/compatible-hashing-3,0:12:07.833000 Docker: daemon-devnet-focal-docker-image,feature/compatible-hashing-3,0:16:10.836000 Docker: daemon-berkeley-focal-docker-image,feature/compatible-hashing-3,0:16:21.350000 Docker: daemon-mainnet-focal-docker-image,feature/compatible-hashing-3,0:16:13.254000 Docker: archive-focal-docker-image,feature/compatible-hashing-3,0:15:24.911000 Docker: rosetta-focal-docker-image,feature/compatible-hashing-3,0:45:33.608000 Docker: zkapp-test-transaction-focal-docker-image,feature/compatible-hashing-3,0:14:37.830000 Build Libp2p helper for Bullseye,feature/compatible-hashing-3,0:01:02.584000 Build Mina for Bullseye,feature/compatible-hashing-3,0:20:13.967000 Docker: daemon-devnet-bullseye-docker-image,feature/compatible-hashing-3,0:24:41.720000 Docker: daemon-berkeley-bullseye-docker-image,feature/compatible-hashing-3,0:24:44.280000 Docker: daemon-mainnet-bullseye-docker-image,feature/compatible-hashing-3,0:24:42.308000 Docker: archive-bullseye-docker-image,feature/compatible-hashing-3,0:23:04.673000 Docker: rosetta-bullseye-docker-image,feature/compatible-hashing-3,0:26:15.772000 Docker: zkapp-test-transaction-bullseye-docker-image,feature/compatible-hashing-3,0:22:48.751000 Build Libp2p helper for Buster,feature/compatible-hashing-3,0:00:50.238000 Build Mina for Buster,feature/compatible-hashing-3,0:10:03.569000 Docker: daemon-devnet-buster-docker-image,feature/compatible-hashing-3,0:14:19.330000 Docker: daemon-berkeley-buster-docker-image,feature/compatible-hashing-3,0:13:55.349000 Docker: daemon-mainnet-buster-docker-image,feature/compatible-hashing-3,0:14:07.735000 Docker: archive-buster-docker-image,feature/compatible-hashing-3,0:13:15.160000 Docker: rosetta-buster-docker-image,feature/compatible-hashing-3,0:28:54.115000 Docker: zkapp-test-transaction-buster-docker-image,feature/compatible-hashing-3,0:12:05.257000 Check merges cleanly into compatible,feature/compatible-hashing-3,0:00:13.321000 Check merges cleanly into develop,feature/compatible-hashing-3,0:00:22.119000 "OCaml Lints; Check-format, Require-ppx-version",feature/compatible-hashing-3,0:07:56.758000 "Fast lint steps; CODEOWNERs, RFCs, Check Snarky Submodule, Preprocessor Deps",feature/compatible-hashing-3,0:00:06.590000 Fast lint steps; versions compatibility changes,feature/compatible-hashing-3,0:04:01.900000 Fast lint steps; binable compatibility changes,feature/compatible-hashing-3,0:01:16.219000
Shouldn't this be in review?
Closed for now as this was due in most part by node preemptive in google cloud.
Investigate the root cause of the few builds that take too long (>250m) to finish.