Open maroux opened 2 years ago
If a GAE instance is restarted, it should come back with the same
GAE_INSTANCE
@shsamkit were you able to test this bit?
Missed the comment,
For app-engine, if we are using manual_scaling
with a fixed number of instances the GAE_INSTANCE remain the same on deletion and restart.
But using automatic scaling or rolling a new version (with manual_scaling) the GAE_INSTANCE values change.
I think for firehose's usecase manual_scaling is good enough so restart case can be handled with this change, and anyway for a version update (adding a new site/topic) firehose should choose a new leader anyway
Ah I see so basically we can use GAE_INSTANCE
as long as manual_scaling is used for the group?
Ah I see so basically we can use
GAE_INSTANCE
as long as manual_scaling is used for the group?
yep!
All nodes will come up in state
candidate
. In this mode, app will try to elect itself as leader and do nothing else. Electing oneself to be a leader is done by creating a file${METADATA_BUCKET}/leader.json
with contents:{"deployment_id": ..., "node_id": ..., "timestamp": ...}
. The node id is derived from env varGAE_INSTANCE
which will contain the unique GAE instance id. If a GAE instance is restarted, it should come back with the sameGAE_INSTANCE
thereby restoring leader's function. If the node was able to create the file, it moves to stateleader
.Only one node will be able to create the file since we'll use atomic creation based on "create only if file does not exist". If a node fails to create the file, it reads the existing file instead. If the
deployment_id
does not match it's deployment id (env varGAE_DEPLOYMENT_ID
), then it falls back and retries after a timeout until a global timeout is reached. If the global timeout is reached, the node declares a fatal error and crashes. If the file exists and deployment id matches, this node moves to statefollower
.