Open dshcherb opened 5 years ago
Also, to add to the context, keyspaces created in Cassandra (config and analytics) use SimpleStrategy instead of NetworkTopologyStrategy and some have replication_factor set to 1 or 2, not 3.
system_auth keyspace specifically has replication_factor set to 1 while the recommended one would be 3 for HA deployments.
https://docs.datastax.com/en/security/6.0/security/secSystemKeyspace.html
"The default replication factor for the system_auth and dse_security keyspaces is 1.
Each of these must be updated in production environments to avoid data loss. DataStax recommends changing the replication factor before enabling authentication. "
cqlsh> SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | replication
----------------------+----------------+-------------------------------------------------------------------------------------
ContrailAnalyticsCql | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
system_auth | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_schema | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
config_webui | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
system_traces | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
(7 rows)
[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
cqlsh> SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | replication
----------------------+----------------+-------------------------------------------------------------------------------------
system_auth | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_schema | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
svc_monitor_keyspace | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
to_bgp_keyspace | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
config_db_uuid | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
dm_keyspace | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system_traces | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
useragent | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
In the 5.1 branch this appears to be set in _cassandra_ensure_keyspace for one of the Cassandra instances based on the total number of nodes in a cluster:
KEYSPACES = ['config_db_uuid',
'useragent',
'to_bgp_keyspace',
'svc_monitor_keyspace',
'dm_keyspace']
For contrail-analytics the replication factor is hard-coded to be set to 2 for just one keyspace (ContrailAnalyticsCql):
https://github.com/Juniper/contrail-analytics/blob/082493863e99b0656f1f7a589b37f94b45bcf33d/contrail-collector/viz.sandesh#L12
const string COLLECTOR_KEYSPACE_CQL = "ContrailAnalyticsCql"
if (cassandra_options.cluster_id_.empty()) {
tablespace_ = g_viz_constants.COLLECTOR_KEYSPACE_CQL;
} else {
tablespace_ = g_viz_constants.COLLECTOR_KEYSPACE_CQL + '_' + cassandra_options.cluster_id_;
}
https://github.com/Juniper/contrail-analytics/blob/3b23f1cde29893b7a147962def275284ebc36d54/contrail-collector/db_handler.cc#L513-L524
if (!dbif_->Db_AddSetTablespace(tablespace_, "2")) {
Cassandra instances in docker containers use a default configuration taken from upstream packaging (/etc/cassandra/cassandra-rackdc.properties) which means that they do not override
rack=
settings used during replication with NetworkTopologyStrategy.https://github.com/apache/cassandra/blob/cassandra-3.11.3/conf/cassandra-rackdc.properties#L17-L20 (upstream config)
Configuring this parameter is essential for handling rack failure scenarios.
Juju exposes availability zones configured for nodes in MAAS via JUJU_AVAILABILITY_ZONE environment variable available during hook execution which can be retrieved and used to render a proper config file for Cassandra: https://docs.jujucharms.com/juju-environment-variables#heading--juju_availability_zone
Cassandra config in all containers:
Cassandra status sample: