eklavya / activeGrid

activeGrid in Scala
Apache License 2.0
2 stars 7 forks source link

Activegrid application design-To support scalability #107

Open sivakumargollu opened 7 years ago

sivakumargollu commented 7 years ago

In activegrid application site-service, scaling-service and workflow services have shared memory access. But the present development model is not addressing shared memory issue and the fencing problem(node-failure management.) This issue is to look into design problems and to provide the best model to overcome existing flaws.

sivakumargollu commented 7 years ago

ActiveGrid Application design to attain scalability.

In activegrid application site-service, scaling-service and workflow services have shared memory access. But the present development model is not addressing shared memory issue and the fencing problem(node-failure management.)

Feasible approaches to solve the shared-memory issue.

  1. Neo4j transactions

Each service is maintaining its own in-memory data for faster access. For instance, workflow service is holding current executing workflow services in map data structure. Each entry in the is a map a tuple of workflow-id and it's present execution context. Status of workflow is modifying according to its execution. While rewriting the application in scala, this status is being moved to neo4j database, Any change in the status of the application will be represented in database flag instead of in-memory value.

activegrid-workflow-single-node-application 2

Above shows existing active grid development model with respect to workflow service. On single node execution, there seems to be no issue with above model in terms of consistency. But if the activegrid application is deployed on cluster across multiple nodes, it might lead to the redundant execution of the same task due to non-transactional access to data from neo4j. For example, Consistency in the execution of a workflow service is not guaranteed.i.e status of a workflow running in one server need not be essentially known by other servers, eventually any request to execute the same workflow will be started again

This problem can be solved with the neo4j transaction. With following assumptions.

  1. Neo4j hosted as independent server i.e application should not configure to create and use neo4j where-ever it is running but instead all application servers should point to single neo4j server.

image

OR

  1. Mircoservices/Splitting application into mutliple units.

In this approach, each service can be viewed as independent service. All independent service(At least which needs special care) must be hosted individual machines. A proxy server or load balancing server intercept all incoming requests forward them to respective services.

image

Each service independent of remaining service. There should be the central database to access a common set of data.

Drawbacks.

  1. Use of neo4j transactions or microservices can address the node-failure management.

Akka-cluster approach.

//Edit

sivakumargollu commented 7 years ago

This design address following issues.

  1. Node failure management.
  2. Operation status
  3. Shared data issue.

1. Node failure management. There are multiple use-cases where node-failure have to be addressed. A.To maintain critical service request processing like auto-scaling,workflow-execution,site-creation, the request status will be maintained in Neo4j database. Each request to these services will be fixed with specific intervals of time. If the server failed before time-lapses status will change by the next request to the same service after checking the timestamp of the last request. The request will wait for the specific time if required.

B. If Neo4j database itself failed, Cluster should respond with failing status to all incoming request without proceeding further.

C. Any in progress request must be roll-backed or modified according to execution level.

Node failure must be notified to remaining participants of the cluster, If We maintain the Neo4j cluster with master-slave architecture, In the event of main neo4j server shutdown due to unexpected reasons then one of the slaves will be master and operation execution will proceed.

2. Operation status issue.

If half completed request led to the inconsistent status at Aws i.e while executing commands, While deploying the application, while executing terminal scripts then it must be processed again by one of the servers present in the cluster by restarting request process from the beginning.

3. Sharing data between multiple nodes. We will proceed with an Akka-cluster concept called CRDT to avoid data sharing issue.

image

sivakumargollu commented 7 years ago

Node failure managment.

image

sivakumargollu commented 7 years ago

With distributed data and load-balancing in cluster.

image