Netflix / Priam

Co-Process for backup/recovery, Token Management, and Centralized Configuration management for Cassandra.
Apache License 2.0
1.03k stars 294 forks source link

Priam on new ASG instance tells Cassandra to gossip with dead ASG EC2 instance #686

Open amr46 opened 6 years ago

amr46 commented 6 years ago

Setup:

Fix:

Questions:

amr46 commented 6 years ago

I think that the protocol used by priam is incorrect: If is_replace = true, and it's attempting to replace a downed node - that node might be unavailable altogether. Priam has explicitly marked this downed node as dead, so the expectation of any communication with it should be 0.

Cassandra, when started with in replace mode, attempts to talk to the downed node and fails whenever the node doesn't exist in gossip. Hence the replace can never happen without manual intervention.

@arunagrawal84 thx for helping out in the past, could you comment on this?

arunagrawal84 commented 6 years ago

@amr46 can you please confirm if other 2 nodes (in other AZ's), are marked as seed nodes as well?

amr46 commented 6 years ago

I will have to replicate the environment and get back to you ASAP in the week of 9/3 @arunagrawal84