estuary / flow

🌊 Continuously synchronize the systems where your data lives, to the systems where you _want_ it to live, with Estuary Flow. 🌊
https://estuary.dev
Other
589 stars 47 forks source link

Flowctl apply causes pods to crash when tasks are removed #284

Closed psFried closed 2 years ago

psFried commented 2 years ago

To reproduce:

  1. Apply a catalog that includes a materialization
  2. Rename the materialization and re-apply
  3. Pod crashes

This may affect other task types besides just materializations, but that's how I observed it. A couple of observations:

I'm thinking it's probably best to address this either as part of #275 or else in a separate PR after that work lands.

Logs from crashed pod ``` {"buildDate":"2021-10-14-15:29:35-UTC","config":{"Consumer":{"Zone":"us-central1-c","ID":"flow-reactor-6dbc59cb78-ktz8b","Host":"10.116.0.11","Port":8080,"Limit":32},"Broker":{"Address":"http://flow-gazette.flow.svc:8080","Cache":{"Size":256,"TTL":60000000000},"FileRoot":""},"Etcd":{"Address":"http://flow-etcd.flow.svc:2379","LeaseTTL":20000000000,"Prefix":"/gazette/consumers/flow/reactor"},"Log":{"Level":"info","Format":"json"},"Diagnostics":{},"Flow":{"CatalogRoot":"/flow/catalog","BrokerRoot":"/gazette/cluster/flow"},"DisableClockTicks":false,"Poll":false,"ConnectorNetwork":""},"level":"info","msg":"consumer configuration","time":"2021-10-28T14:58:52Z","version":"v0.1.0-472-g5fb28e9"} {"endpoint":"http://10.116.0.11:8080","group":"/gazette/consumers/flow/reactor","id":"flow-reactor-6dbc59cb78-ktz8b","level":"info","msg":"starting consumer","time":"2021-10-28T14:58:52Z","zone":"us-central1-c"} {"level":"info","memberId":14358579373743376895,"msg":"etcd MemberId/RaftTerm changed","raftTerm":2,"time":"2021-10-28T14:58:52Z","update.MemberId":8069308619253966104,"update.RaftTerm":2} {"id":"derivation/redacted/bid-exchange/flattened-v3/00000000-00000000","level":"info","msg":"starting local shard","route":{"members":[{"zone":"us-central1-c","suffix":"flow-reactor-6dbc59cb78-ktz8b"},{"zone":"us-central1-c","suffix":"flow-reactor-6dbc59cb78-rj89n"}],"primary":1},"time":"2021-10-28T14:58:52Z"} {"id":"derivation/redacted/bid-exchange/joined-v2/00000000-00000000","level":"info","msg":"starting local shard","route":{"members":[{"zone":"us-central1-c","suffix":"flow-reactor-6dbc59cb78-ktz8b"},{"zone":"us-central1-c","suffix":"flow-reactor-6dbc59cb78-nhdjg"}],"primary":1},"time":"2021-10-28T14:58:52Z"} {"id":"materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","level":"info","msg":"starting local shard","route":{"members":[{"zone":"us-central1-c","suffix":"flow-reactor-6dbc59cb78-ktz8b"},{"zone":"us-central1-c","suffix":"flow-reactor-6dbc59cb78-nhdjg"}],"primary":1},"time":"2021-10-28T14:58:52Z"} {"dir":"/tmp/materialize_redacted_bid-exchange_flattened-v2-to-snowflake_00000000-00000000-436372839","id":"materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","level":"info","log":"recovery/materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","msg":"began recovering shard store from log","time":"2021-10-28T14:58:52Z"} {"dir":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193","id":"derivation/redacted/bid-exchange/flattened-v3/00000000-00000000","level":"info","log":"recovery/derivation/redacted/bid-exchange/flattened-v3/00000000-00000000","msg":"began recovering shard store from log","time":"2021-10-28T14:58:52Z"} {"dir":"/tmp/derivation_redacted_bid-exchange_joined-v2_00000000-00000000-529543066","id":"derivation/redacted/bid-exchange/joined-v2/00000000-00000000","level":"info","log":"recovery/derivation/redacted/bid-exchange/joined-v2/00000000-00000000","msg":"began recovering shard store from log","time":"2021-10-28T14:58:52Z"} {"id":"materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","level":"info","log":"recovery/materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","msg":"now tailing live log","time":"2021-10-28T14:58:52Z"} {"id":"derivation/redacted/bid-exchange/flattened-v3/00000000-00000000","level":"info","log":"recovery/derivation/redacted/bid-exchange/flattened-v3/00000000-00000000","msg":"now tailing live log","time":"2021-10-28T14:58:52Z"} {"id":"materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","level":"info","log":"recovery/materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","msg":"promoted to primary","time":"2021-10-28T14:58:52Z"} {"id":"derivation/redacted/bid-exchange/flattened-v3/00000000-00000000","level":"info","log":"recovery/derivation/redacted/bid-exchange/flattened-v3/00000000-00000000","msg":"promoted to primary","time":"2021-10-28T14:58:52Z"} {"dir":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193","files":10,"lastLog":"recovery/derivation/redacted/bid-exchange/flattened-v3/00000000-00000000","level":"info","msg":"completed playback","nextChecksum":2133596214,"nextSeqNo":299,"time":"2021-10-28T14:58:52Z"} {"fnode":119,"level":"info","msg":"linked file","size":1158,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/000017.sst","time":"2021-10-28T14:58:52Z"} {"fnode":168,"level":"info","msg":"linked file","size":1158,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/000023.sst","time":"2021-10-28T14:58:52Z"} {"fnode":206,"level":"info","msg":"linked file","size":1158,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/000029.sst","time":"2021-10-28T14:58:52Z"} {"fnode":271,"level":"info","msg":"linked file","size":9884,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/OPTIONS-000041","time":"2021-10-28T14:58:52Z"} {"dir":"/tmp/materialize_redacted_bid-exchange_flattened-v2-to-snowflake_00000000-00000000-436372839","files":1,"lastLog":"recovery/materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","level":"info","msg":"completed playback","nextChecksum":3190623979,"nextSeqNo":8802,"time":"2021-10-28T14:58:52Z"} {"fnode":253,"level":"info","msg":"linked file","size":1158,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/000037.sst","time":"2021-10-28T14:58:52Z"} {"fnode":8788,"level":"info","msg":"linked file","size":477,"target":"/tmp/materialize_redacted_bid-exchange_flattened-v2-to-snowflake_00000000-00000000-436372839/state.json","time":"2021-10-28T14:58:52Z"} {"fnode":267,"level":"info","msg":"linked file","size":6006,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/000039.log","time":"2021-10-28T14:58:52Z"} {"fnode":262,"level":"info","msg":"linked file","size":16,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/CURRENT","time":"2021-10-28T14:58:52Z"} {"fnode":61,"level":"info","msg":"linked file","size":1156,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/000011.sst","time":"2021-10-28T14:58:52Z"} {"fnode":224,"level":"info","msg":"linked file","size":9884,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/OPTIONS-000033","time":"2021-10-28T14:58:52Z"} {"fnode":255,"level":"info","msg":"linked file","size":477,"target":"/tmp/derivation_redacted_bid-exchange_flattened-v3_00000000-00000000-396550193/MANIFEST-000038","time":"2021-10-28T14:58:52Z"} {"err":"completeRecovery: store.RestoreCheckpoint: catalog task \"redacted/bid-exchange/flattened-v2-to-snowflake\" not found","level":"error","msg":"servePrimary failed","shard":"/gazette/consumers/flow/reactor/items/materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","time":"2021-10-28T14:58:52Z"} {"fields.level":"info","level":"info","logCollection":"ops/redacted/logs","msg":"starting new log publisher","time":"2021-10-28T14:58:53Z"} {"lastRevision":0,"level":"info","msg":"initialized catalog task term","revision":3482,"time":"2021-10-28T14:58:53Z"} npm WARN saveError ENOENT: no such file or directory, open '/tmp/javascript-worker432452728/package.json' npm notice created a lockfile as package-lock.json. You should commit this file. npm WARN enoent ENOENT: no such file or directory, open '/tmp/javascript-worker432452728/package.json' npm WARN javascript-worker432452728 No description npm WARN javascript-worker432452728 No repository field. npm WARN javascript-worker432452728 No README data npm WARN javascript-worker432452728 No license field. + catalog-js-transformer@0.0.0 added 1 package and audited 1 package in 0.246s found 0 vulnerabilities {"args":["node_modules/.bin/catalog-js-transformer"],"level":"info","msg":"started worker daemon","pid":80,"socketPath":"/tmp/javascript-worker432452728/socket","time":"2021-10-28T14:58:53Z"} {"level":"info","memberId":9752624356811177708,"msg":"etcd MemberId/RaftTerm changed","raftTerm":2,"time":"2021-10-28T14:58:53Z","update.MemberId":8069308619253966104,"update.RaftTerm":2} {"id":"materialize/redacted/bid-exchange/flattened-v2-to-snowflake/00000000-00000000","level":"info","msg":"stopping local shard","time":"2021-10-28T14:58:53Z"} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x170334a] goroutine 405 [running]: github.com/estuary/flow/go/runtime.(*Materialize).Destroy(0xc000790500) /home/runner/work/flow/flow/go/runtime/materialize.go:354 +0x2a go.gazette.dev/core/consumer.waitAndTearDown(0xc0008029a0, 0xc00061ede0) /home/runner/go/pkg/mod/go.gazette.dev/core@v0.89.1-0.20210923211114-a1ff40b2ced0/consumer/shard.go:292 +0x63 created by go.gazette.dev/core/consumer.(*Resolver).cancelShards /home/runner/go/pkg/mod/go.gazette.dev/core@v0.89.1-0.20210923211114-a1ff40b2ced0/consumer/resolver.go:375 +0xa5 ```
jgraettinger commented 2 years ago

Thanks, I'll roll this into my current work. Quick fix.