Pilosa nodes are going to need various operational states other than up/down. These states are necessary are we prepare Pilosa to dynamically grow, shrink, and replace nodes.
Background
As Pilosa goes into production we will need the ability to make changes to the cluster while it is running. To facilitate this we will need to define a set of states of behavior for a node. These sets of behavior will define the life cycle of a node from: birth, growth, reproduction, and death. For a new node to join an existing cluster it will need different set of behaviors during this phase and will need to communicate this state to the rest of the cluster
Proposal
Each node in a pilosa cluster needs to maintain a state and return its current status
A pilosa node can be one of the following states
Normal: connected to the rest of the cluster and serving all requests
Leaving: In the process of leaving the cluster, no longer responding to SetBit requests. Still responding to Sync endpoints to transfer data
Joining: In the process of joining the cluster, not responding to Bitmap queries but can respond the SetBit.
Moving: In the process of redistributing Slice Fragments across the cluster, but this node will remain with new hash assignments
Define a list of operations allowed by each state
Normal:
-all
Leaving:
/export
/slices/max
/fragment/nodes
/fragment/data
/fragment/blocks
/fragment/block/data
/nodes
/version
/status
Joining
/schema
/import
/export
/slices/max
/db
/db/time_quantum
/db/attr/diff
/frame
/frame/time_quantum
/frame/attr/diff
/fragment/nodes
/fragment/data
/fragment/blocks
/fragment/block/data
/frame/restore
/nodes
/version
/status
Moving
I believe this can respond to all requests, but we may need to limited access
Compatibility
Node state maintenance is in preperation for new features to support a dynamic cluster.
That being said internode communication is a core feature of Pilosa operating normally
We will need to ensure that cluster can respond to queries and data ingestion while some nodes are in the Leaving/Joining. This will require that a minimum replication factor of 2 is in force.
If not we may want to consider any state other than normal "under maintenance" and return a 503 status code on affected endpoints.
Node Health is another resource we should cover in separate proposal.
Implementation
We will need to create a new /status API endpoint
This we be able to return the current node's status or the status of all nodes in the cluster
Through periodic Gossip communication across the cluster each node should maintain a status of other nodes it attempts to communicate with
We will need to maintain an API permission matrix for each state.
Proper status codes will be returned for unavailable resources
Proposal: Add State and status to Nodes
Author: Michael Baird
Last updated: 2017/02/16
Abstract
Pilosa nodes are going to need various operational states other than up/down. These states are necessary are we prepare Pilosa to dynamically grow, shrink, and replace nodes.
Background
As Pilosa goes into production we will need the ability to make changes to the cluster while it is running. To facilitate this we will need to define a set of states of behavior for a node. These sets of behavior will define the life cycle of a node from: birth, growth, reproduction, and death. For a new node to join an existing cluster it will need different set of behaviors during this phase and will need to communicate this state to the rest of the cluster
Proposal
Define a list of operations allowed by each state
Compatibility
Node state maintenance is in preperation for new features to support a dynamic cluster.
That being said internode communication is a core feature of Pilosa operating normally We will need to ensure that cluster can respond to queries and data ingestion while some nodes are in the Leaving/Joining. This will require that a minimum replication factor of 2 is in force. If not we may want to consider any state other than normal "under maintenance" and return a 503 status code on affected endpoints.
Node Health is another resource we should cover in separate proposal.
Implementation
We will need to create a new /status API endpoint This we be able to return the current node's status or the status of all nodes in the cluster
Through periodic Gossip communication across the cluster each node should maintain a status of other nodes it attempts to communicate with
We will need to maintain an API permission matrix for each state. Proper status codes will be returned for unavailable resources