Cloudkibo / KiboPush

0 stars 1 forks source link

Design a performance management system #6319

Open jekram opened 5 years ago

jekram commented 5 years ago

This is a Design task to implement performance management and dashboard.

Currently, we look into Digital Ocean it shows different usage graph for CPU, Memory, Bandwidth, I/O..... These are shown in the following way last 6 Hrs, 24 Hrs, 7 days and 30 Days.

So when we have high CPU utilization, we cannot correlate to the number of transactions we are processing.

We need to capture the following message count for:

Broadcast Broadcast API Poll Survey AutoPosting AutoPosting FB Post Invite using phone numbers.

We need to show in the graph the number of messages in: last 6 hrs last 24 hrs last 7 days last 30 days

The design should be extensible. However, in phase we need to only implement for:

Broadcast Broadcast API AutoPosting

With intervals of: last 6hrs last 24 hrs

In Phase 1 we do not need to keep the data for more than 24 hrs.

The design should include both the logic and UI (please keep it close to Digital Ocean).

The task intends to determine and correlate CPU or Memory usage with transactions.

The intent is not for historical trending like https://github.com/Cloudkibo/KiboPush/issues/6259

sojharo commented 5 years ago

I worked on this yesterday and have started creating a design document which is under construction. I also studied current KiboDash work that we have done to store summarised information of our operational dashboard. We would use that system to store summaries of performance.

https://docs.google.com/document/d/1xBLOGr7nRGhvEnKv4-2YM4WlMtZiKyW4ktNuZQE-akw/edit

sojharo commented 5 years ago

I have completed the design for this and listed the issues as welll

jekram commented 5 years ago

Here are the key areas of my feedback:

  1. We will collect a lot of transaction data for performance.

  2. Performance Data need to be collected on the Production System but should not be stored on the same server. It must be stored on a separate server. (call it reporting storage server).

  3. Performance reports should not access Production Servers where we are processing customers transactions.

  4. For reporting, we have two separate requirements. Reporting aggregate data and Reporting Performance Transaction Data. Both of these have two very different requirements. So we should not use the same technology.

  5. Performance data should be stored on a system which has high thruput. We know the data structure. The performance data has a transaction value initially for 1 day and later maybe for 7 days. We should look for SQL database and cashing. We should think about high thruput system.

  6. If we need to get a separate server that would be ok.

  7. More then likely we should keep aggregate and performance data separate.

  8. Instead of writing out own UI we should look for outside libraries to do the work.

sojharo commented 5 years ago

Yesterday, I thought more and investigated further for this issue. I have also looked into couple of open source dashboard tools so that we should not write our own UI.

Performance Data need to be collected on the Production System but should not be stored on the same server. It must be stored on a separate server. (call it reporting storage server).

This gives the complete use case of ETL solution where we extract data from production system, then transform it and load in our reporting server. The open source dashboard solutions that I have looked into provide out of the box such solutions.

The KiboDash that we have developed is doing part of this thing as in it we load and transform data from production on nightly basis. It is stored on KiboDash. However, for reporting purpose we show data on UI which loads on production system. We fetch the data from KiboDash

Performance reports should not access Production Servers where we are processing customers transactions.

As per our design document, we would put preformance data on KiboDash. I have also seen open source dashboard and BI solutions so that we can replace our own KiboDash server with them.

I have investigated the open source dashboard solutions for doing UI and have discussed them in the document.

https://docs.google.com/document/d/1xBLOGr7nRGhvEnKv4-2YM4WlMtZiKyW4ktNuZQE-akw/edit

I would suggest that we should expire our current KiboDash code and server and just go towards one of these BI and dashboard tools along with caching systems. Also, KiboDash doesn't have its own UI and it just holds the summarised data and then our production UI fetches data from it to show.

I think initially we should go for aggregate data and performance data in same place and then eventually make it separate as required.

For caching, I need to investigate some of the solutions as it would help a lot if we do not call storing data code each time a new message is being sent out. They should be cached and then we send them out on nightly basis to reporting server.

If we want to keep going with our current KiboDash then I think design document is ready to implement and we can just open tasks and start implementing the performance reporting work.

sojharo commented 5 years ago

I did more work on this yesterday and tried running both redis and memcached systems for doing caching. I found redis to be easier and simple to setup and use than memcached. Also, the nodejs client libraries for redis are very easy to setup use and understand.

https://redis.io/

https://try.redis.io/

https://linuxhint.com/nodejs_redis/

https://github.com/NodeRedis/node_redis

I think we should go for redis for caching. For UI graphs as discussed in the design document we would go with one of the dashboard BI tools. I am investigating them today.

sojharo commented 5 years ago

I did a lot of investigation in UI tools for dashboards discussed in document on last Saturday and tried to run them myself. Most of them were very complex and had high learning curve. However, I found one tool which is open source and simple and also supports fetching data from http server on intervals.

https://github.com/Freeboard/freeboard

With this, we would just write the server logic to save summarised performance data in any database and then this UI will fetch data from it. Basically, this is complete UI solution and no server is provided.

With this, we would use our caching work discussed above in combination. Our cache would store performance data and the script on performance dashboard server would periodically fetch data from there and put in its database. Freeboard UI would show the graphs and charts from that performance database.

I think we have found the right tooling with combination of Redis and freeboard.

As a next step, I would simply install redis on our kiboengage server and start putting performance data in it. After this, I would go and install freeboard on new server. We won't open milestone for this as this single task would be sufficient to carry on all the work for this feature.

This is under construction.

jekram commented 5 years ago

@sojharo Please post status from Friday

sojharo commented 5 years ago

On Friday, I installed redis server on both kiboengage staging and production droplets and also did the code to record performance values in it for the keys. I also did some test values on redis cache as well. I expire them after 24 hours so before 24 hours our performance management system should fetch the performance data for the day.

The next step in this is to buy and setup the droplet for performance management system where I will install the open source dashboard UI freeboard which we decided after looking into other options. Besides this UI, we would also setup SQL or mongodb to store this summary data for let's say 2 days initially.

Links for reference:

https://linuxhint.com/nodejs_redis/ https://github.com/NodeRedis/node_redis https://redis.io/commands/keys

sojharo commented 5 years ago

I am doing some more tests today as well on staging.

jekram commented 5 years ago

Let's do SQL instead of Mango.

Can you please draw an architecture diagram on how this will look like

Also, update the design document

sojharo commented 5 years ago

Yes, I will create diagram today as well while setting up the droplet.

sojharo commented 5 years ago

This is under construction. I was having some issues in SQL installation, I completed it now. Today, I would do dashboard and server setup on this and also create a diagram and update the document.

sojharo commented 5 years ago

I worked further on this on Tuesday and had to spend much time on this as dashboard open source UI was not starting after settnig up. However, I was able to complete it and now we need to do the server side work to fetch data from our cache on 24 hour basis.

jekram commented 5 years ago

@sojharo What is the update here?

sojharo commented 5 years ago

I could not give time to this issue on Friday due to issues #6420 and #6396

sojharo commented 5 years ago

Salam, I had spent most of my time on this issue yesterday as it was crashing due to wrong version of nodejs, I was able to solve it and was able to serve the dasbhoard:

Screenshot 2019-08-27 at 5 45 09 AM

However, the dashboard was shown empty and was not fetching the data. I tried several combinations of solution but it didn't work. I will try again on this today. Under construction

sojharo commented 5 years ago

I completed the work on this yesterday. I am doing some testing on 24 hours data. Faizan also test this.

Faizan20 commented 5 years ago

@sojharo kindly help me to test it

Faizan20 commented 5 years ago

i have tested it is working fine but the UI should be more simple here. i have discussed with @sojharo regarding this that there should be like graphs and all so that it will be easy to understand that.

sojharo commented 5 years ago

I started work on this and looked into other open source user interface solutions for graphs. I am experimenting with dashbuilder now. I set it up and localhost and trying to fetch information in it. If it works then we will start using that.

http://www.dashbuilder.org

jekram commented 4 years ago

what is the update here?

sojharo commented 4 years ago

I could not work on this tasks due to lots of work on other tasks these days. I am continuing this today as I don't have other urgent tasks.

sojharo commented 4 years ago

I was successfully able to run dashbuilder server. It also has the line graphs which will show data as we were able to see in digital ocean reports. However, it requires some work to transform the data in the structure that software wants, I will open few tasks in this.

sojharo commented 4 years ago

The dashbuilder was set to auto seed the database so it showed the sample data when I started today. Here is how it looks on my localhost:

Screenshot 2019-10-08 at 10 23 14 AM Screenshot 2019-10-08 at 10 23 53 AM

As we might not be able to do demo today so I am putting these screenshots here. I will fix the seed data issue today.

sojharo commented 4 years ago

I was able to successfully remove the seeding and also did some transformation of data on dashboard builder on our own data. Here is what reports look like now:

For broadcasts

Screenshot 2019-10-09 at 2 01 49 AM

For Surveys

Screenshot 2019-10-09 at 2 01 59 AM

For Polls

Screenshot 2019-10-09 at 2 02 06 AM

For testing purpose, I am fetching and transforming data of staging. Once successful, we will import production database as well into this. On staging, we have very few data.

As discussed, I will be opening following tasks for this feature:

  1. Setup on performance production server instead of localhost
  2. Setup performance dashboard for autoposting twitter and facebook
  3. Refresh data after every 24 hours, delete old data
sojharo commented 4 years ago

This is how production database is being shown in performance dashboard.

Screenshot 1

Screenshot 2019-10-09 at 2 08 26 PM

Screenshot 2

Screenshot 2019-10-09 at 2 08 37 PM

For other tasks in this, I have opened separate issues #6806 #6807 and #6808

I will be updating further work in these issues and we can close this issue after review.