docker-library / cassandra

Docker Official Image packaging for Cassandra
Apache License 2.0
262 stars 281 forks source link

Update Cassandra 4.0 to beta4 #221

Closed emerkle826 closed 3 years ago

emerkle826 commented 3 years ago

Cassandra 4.0-beta4 was released on December 31, 2020. Not sure what the process is to get the image updated on DockerHub.

emerkle826 commented 3 years ago

Actually, I'm guessing it hasn't been updated yet because the build here isn't stable: https://doi-janky.infosiftr.net/job/update.sh/job/cassandra/

tianon commented 3 years ago

Ouch - looks like the problem is that the initialization which previously was able to complete successfully within 20 seconds is now taking significantly longer (at least twice as much time, and from the logs it appears to know it's only supposed to gossip with itself and then still waits at least 30 seconds before it gives up on other gossip nodes). Running just that one test with an adjusted retry.sh to give the startup more time took over a full minute on my local (quite fast CPU + NVMe disk) host.

Relevant logs (with the 30 second delay): ``` INFO [main] 2021-01-15 20:39:13,496 StorageService.java:645 - Unable to gossip with any peers but continuing anyway since node is in its own seed list INFO [main] 2021-01-15 20:39:13,507 StorageService.java:950 - Starting up server gossip INFO [main] 2021-01-15 20:39:13,509 ColumnFamilyStore.java:870 - Enqueuing flush of local: 0.451KiB (0%) on-heap, 0.000KiB (0%) off-heap INFO [PerDiskMemtableFlushWriter_0:1] 2021-01-15 20:39:13,515 Memtable.java:452 - Writing Memtable-local@384554343(0.079KiB serialized bytes, 2 ops, 0%/0% of on/off-heap limit), flushed range = (null, null] INFO [PerDiskMemtableFlushWriter_0:1] 2021-01-15 20:39:13,515 Memtable.java:481 - Completed flushing /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/na-3-big-Data.db (0.048KiB) for commitlog position CommitLogPosition(segmentId=1610743151944, position=31446) INFO [MemtableFlushWriter:1] 2021-01-15 20:39:13,527 LogTransaction.java:240 - Unfinished transaction log, deleting /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/na_txn_flush_ba65d950-5771-11eb-8cae-47b2ed6b2f6d.log INFO [main] 2021-01-15 20:39:13,551 StorageService.java:1025 - This node will not auto bootstrap because it is configured to be a seed node. INFO [main] 2021-01-15 20:39:43,556 Gossiper.java:2079 - Waiting for gossip to settle... INFO [main] 2021-01-15 20:39:51,557 Gossiper.java:2110 - No gossip backlog; proceeding INFO [main] 2021-01-15 20:39:51,558 Gossiper.java:2079 - Waiting for gossip to settle... INFO [main] 2021-01-15 20:39:59,559 Gossiper.java:2110 - No gossip backlog; proceeding INFO [main] 2021-01-15 20:39:59,560 NetworkTopologyStrategy.java:84 - Configured datacenter replicas are datacenter1:rf(3) ```
aholmberg commented 3 years ago

The longer startup time is known and intentional in 4.0-beta4. tl;dr there are some new setting defaults that will cause the server to allocate tokens, waiting for a fixed interval before doing so. https://issues.apache.org/jira/browse/CASSANDRA-13701

This can be bypassed in a couple of ways:

The delay can be shorted by defining a system property cassandra.ring_delay_ms. https://github.com/apache/cassandra/blob/5e8f7f591dfec5a61d8eb2e9e977ec29f3a2bbe4/src/java/org/apache/cassandra/service/StorageService.java#L152

However, any of the mitigation techniques above have implications. Maybe the best solution will be to relax the startup timeout here, and rely on the new default settings.

tianon commented 3 years ago

Thank you for the additional context, that's massively helpful! :heart:

In this case, it's a test for just some very minimal basics of a functioning single-server instance, so I've opted to adjust cassandra.ring_delay_ms just for the test in https://github.com/docker-library/official-images/pull/9491 (given we don't want to adjust any of the defaults in the image :sweat_smile:).

tianon commented 3 years ago

That test fix was merged and the updated images are published 👍