amino-os / Amino.Run

Amino Distributed OS - Runtime Manager
Apache License 2.0
29 stars 12 forks source link

[PR-2]Microservice level metrics[data IN/OUT, RPC processing time] measurment and collection at DM's server policy maintained per client #795

Closed VenuReddy2103 closed 5 years ago

VenuReddy2103 commented 5 years ago

This PR is to measure the Microservice metrics[data IN/OUT, RPC processing time] and collection at DM's server policy maintained per client. And these metrics are periodically notified to the respective group policy. These received metrics storing and decision making at group policy/OMS is not part of this PR. This PR is same as old PR #746. Just raised based on new fork and closing the old PR.

quinton-hoole commented 5 years ago

Thanks @VenuReddy2103 . Let me know when this is ready for review.

quinton-hoole commented 5 years ago

See https://github.com/amino-os/Amino.Run/pull/794#issuecomment-500074934

quinton-hoole commented 5 years ago

@VenuReddy2103 Still no reply to https://github.com/amino-os/Amino.Run/pull/794#issuecomment-500074934 ?

quinton-hoole commented 5 years ago

Still some things to improve in this PR, especially:

  1. the idea was for all ServerPolicies on a node to report metrics to the local kernelServer, which then batches for all local microservides, and sends the whole batch to the OMS, which then demultiplexes to the individual groupPolicies. If you have a few hundred or a thousand microservices on a node, having them all individually creating connections to the groupPolicies on the OMS seems unscalable.
  2. There appears to be no smoothing function applied to metrics (e.g. moving average).

Some of these are possibly addressed in the linked followup PR's. I'm going to get all of those merged, and then address the improvements in further followup PR's as needed.

quinton-hoole commented 5 years ago

I re-checked that all unit, integration and example tests pass.
There was one intermittent exception in the integration tests (see below), and another intermittent exception in one of the examples. But most of the time all tests passed, and I doubt whether the intermittent failures are related to this PR. So we can debug and fix those in followup PR's.

Example intermittent failures:

INFO: Retrying method public java.lang.Object amino.run.policy.DefaultUpcallImpl$ServerPolicy.onRPC(java.lang.String,java.util.ArrayList<java.lang.Object>) throws java.lang.Exception after 10240ms due to amino.run.policy.util.consensus.raft.LeaderException: Current Leader is 00000000-0000-0000-0000-000000000000
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.TimeoutException: Retry timeout of 20000ms exceeded in AtLeastOnceRPCPolicy
        at amino.run.appexamples.hankstodo.stubs.TodoList_Stub.addToDo(TodoList_Stub.java:168)
        at amino.run.appexamples.hankstodo.HanksTodoMain.main(HanksTodoMain.java:74)
Caused by: java.util.concurrent.TimeoutException: Retry timeout of 20000ms exceeded in AtLeastOnceRPCPolicy
        at amino.run.policy.atleastoncerpc.AtLeastOnceRPCPolicy$ClientPolicy.onRPC(AtLeastOnceRPCPolicy.java:49)
        at amino.run.policy.stubs.DHTPolicy$ServerPolicy_Stub.$__makeKernelRPC(DHTPolicy$ServerPolicy_Stub.java:40)
        at amino.run.policy.stubs.DHTPolicy$ServerPolicy_Stub.onRPC(DHTPolicy$ServerPolicy_Stub.java:141)
        at amino.run.policy.dht.DHTPolicy$ClientPolicy.onRPC(DHTPolicy.java:61)
        at amino.run.appexamples.hankstodo.stubs.TodoList_Stub.addToDo(TodoList_Stub.java:157)
        ... 1 more
Caused by: amino.run.policy.util.consensus.raft.LeaderException: Current Leader is 00000000-0000-0000-0000-000000000000
        at amino.run.policy.util.consensus.raft.Server.applyToStateMachine(Server.java:377)
        at amino.run.policy.replication.ConsensusRSMPolicy$ServerPolicy.onRPC(ConsensusRSMPolicy.java:199)
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at amino.run.common.ObjectHandler.invoke(ObjectHandler.java:106)
        at amino.run.kernel.server.KernelObject.invoke(KernelObject.java:58)
        at amino.run.kernel.server.KernelServerImpl.makeKernelRPC(KernelServerImpl.java:132)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:835)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

> Task :examples:hanksTodo:runapp FAILED
amino.run.multidm.MultiDMTestCases > runTest[52: Test with dms=[class amino.run.policy.scalability.LoadBalancedMasterSlaveSyncPolicy, class amino.run.policy.serializability.SerializableRPCPolicy]] FAILED
    java.lang.RuntimeException at MultiDMTestCases.java:291
        Caused by: amino.run.runtime.exception.AppExecutionException
            Caused by: java.util.concurrent.ExecutionException
                Caused by: java.util.concurrent.RejectedExecutionException

INFO: failed to process request MethodInvocationRequest{clientId='062a7224-d1a9-4256-b5ec-fc00b89e6143', requestId=3, methodName='public java.io.Serializable amino.run.demo.KVStore.get(java.lang.String)', params=[k1_1], methodType=MUTABLE}: java.util.concurrent.ExecutionException: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@f45b9fa rejected from java.util.concurrent.ScheduledThreadPoolExecutor@356dcfd1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 3]
java.util.concurrent.ExecutionException: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@f45b9fa rejected from java.util.concurrent.ScheduledThreadPoolExecutor@356dcfd1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 3]
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at amino.run.policy.scalability.masterslave.Processor.process(Processor.java:72)
        at amino.run.policy.scalability.LoadBalancedMasterSlaveSyncPolicy$ServerPolicy.onRPC(LoadBalancedMasterSlaveSyncPolicy.java:126)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at amino.run.common.ObjectHandler.invoke(ObjectHandler.java:106)
        at amino.run.kernel.server.KernelObject.invoke(KernelObject.java:58)
        at amino.run.kernel.server.KernelServerImpl.makeKernelRPC(KernelServerImpl.java:132)
        at amino.run.kernel.client.KernelClient.tryMakeKernelRPC(KernelClient.java:323)
        at amino.run.kernel.client.KernelClient.makeKernelRPC(KernelClient.java:392)
        at amino.run.policy.stubs.LoadBalancedMasterSlaveSyncPolicy$ServerPolicy_Stub.$__makeKernelRPC(LoadBalancedMasterSlaveSyncPolicy$ServerPolicy_Stub.java:46)
        at amino.run.policy.stubs.LoadBalancedMasterSlaveSyncPolicy$ServerPolicy_Stub.onRPC(LoadBalancedMasterSlaveSyncPolicy$ServerPolicy_Stub.java:182)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at amino.run.common.ObjectHandler.invoke(ObjectHandler.java:106)
        at amino.run.policy.serializability.SerializableRPCPolicy$ServerPolicy.onRPC(SerializableRPCPolicy.java:31)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at amino.run.common.ObjectHandler.invoke(ObjectHandler.java:106)
        at amino.run.kernel.server.KernelObject.invoke(KernelObject.java:58)
        at amino.run.kernel.server.KernelServerImpl.makeKernelRPC(KernelServerImpl.java:132)
        at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:835)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@f45b9fa rejected from java.util.concurrent.ScheduledThreadPoolExecutor@356dcfd1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 3]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:549)
        at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:648)
        at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:681)
        at amino.run.policy.scalability.masterslave.RequestReplicator.replicateInAsync(RequestReplicator.java:70)
        at amino.run.policy.scalability.masterslave.RequestReplicator.replicateInSync(RequestReplicator.java:39)
        at amino.run.policy.scalability.masterslave.Processor$RequestProcessor.call(Processor.java:154)
        at amino.run.policy.scalability.masterslave.Processor$RequestProcessor.call(Processor.java:105)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more