Tendrl / node-agent

A python agent local to every managed storage node in the sds cluster
GNU Lesser General Public License v2.1
4 stars 14 forks source link

Performance improvement RFE #882

Open GowthamShanmugam opened 4 years ago

GowthamShanmugam commented 4 years ago

Few places still we can improve the performance by doing some slight modifications.

Case 1: Make disk sync and network sync as collectd plugin and run once in a day or with some big-time interval. As per the current flow, nowhere disk values are used and we are investing a lot of resources to do the sync. Even if you have any plan in future use it, Network detail and disk detail won't change frequently. So it is ok the do the sync once in a day. Collectd provides features to invoke the sync with some time interval option.

In the server node also we are not using that information, so it is completely ok to move that logic to collectd.

Case 2: From collectd we are pushing the whole bunch of metrics and its values in a single API call. It can be pushed in a batch manner.

case 3: No need to do gluster sync at every 3 sec or 4 sec, Do the gluster sync at the time of import cluster, and next time onwards it should be event basis. Trigger gluster sync only when you are receiving an event from gluster. Event listener should be the separate microservice and it should receive the gluster event and put that in an etcd queue manner and it also sends an indication to gluster -integration like: Hey something happened inside gluster please check the event in etcd. Even gluster-integration missed the event it should look into the DB in some interval. Process the event and delete it immediately. Event listeners should also have the capability to filter the events. Because we dont need to trigger sync for all the events. some sample events to react like:

  1. volume down
  2. brick down
  3. The new volume created etc... Anyway, utilization details are synced by collectd plugin, So no need to do frequent sync from gluster-integration.

case 4: you can the same event listener for alert also, but this time event listener should put the event in the alert queue. whenever alert came from grafana or alert wants to raised from gluster-integration put that alert in event listener with type alert. Now the event listener will put that notification in an alert queue and intimate the alert-service like hey some alert came and this is the alert id. Now alert service read that alert from the queue. No need to read the DB often to check any alert came or not.

you can replace the node-agent socket by event listener and it can be a separate microservice.

case 5: You can use an event listener for some broadcasting purpose also, For example, whenever any jobs came like import or unmanaged, etc.. then the server will put the job in the event listener, The event listener will put the job in job queue and board cast the notification across the node with job Id. Now nodes can read the job from DB.

No need to read the job queue in DB every time. You can target which node should run this job using the event listener, the event listener will send the notification only for that node.

Since event listener is storing notification in DB, it can support retry to notify the node also.