Open jekram opened 5 years ago
I worked on this yesterday and have started creating a design document which is under construction. I also studied current KiboDash work that we have done to store summarised information of our operational dashboard. We would use that system to store summaries of performance.
https://docs.google.com/document/d/1xBLOGr7nRGhvEnKv4-2YM4WlMtZiKyW4ktNuZQE-akw/edit
I have completed the design for this and listed the issues as welll
Here are the key areas of my feedback:
We will collect a lot of transaction data for performance.
Performance Data need to be collected on the Production System but should not be stored on the same server. It must be stored on a separate server. (call it reporting storage server).
Performance reports should not access Production Servers where we are processing customers transactions.
For reporting, we have two separate requirements. Reporting aggregate data and Reporting Performance Transaction Data. Both of these have two very different requirements. So we should not use the same technology.
Performance data should be stored on a system which has high thruput. We know the data structure. The performance data has a transaction value initially for 1 day and later maybe for 7 days. We should look for SQL database and cashing. We should think about high thruput system.
If we need to get a separate server that would be ok.
More then likely we should keep aggregate and performance data separate.
Instead of writing out own UI we should look for outside libraries to do the work.
Yesterday, I thought more and investigated further for this issue. I have also looked into couple of open source dashboard tools so that we should not write our own UI.
Performance Data need to be collected on the Production System but should not be stored on the same server. It must be stored on a separate server. (call it reporting storage server).
This gives the complete use case of ETL solution where we extract data from production system, then transform it and load in our reporting server. The open source dashboard solutions that I have looked into provide out of the box such solutions.
The KiboDash that we have developed is doing part of this thing as in it we load and transform data from production on nightly basis. It is stored on KiboDash. However, for reporting purpose we show data on UI which loads on production system. We fetch the data from KiboDash
Performance reports should not access Production Servers where we are processing customers transactions.
As per our design document, we would put preformance data on KiboDash. I have also seen open source dashboard and BI solutions so that we can replace our own KiboDash server with them.
I have investigated the open source dashboard solutions for doing UI and have discussed them in the document.
https://docs.google.com/document/d/1xBLOGr7nRGhvEnKv4-2YM4WlMtZiKyW4ktNuZQE-akw/edit
I would suggest that we should expire our current KiboDash code and server and just go towards one of these BI and dashboard tools along with caching systems. Also, KiboDash doesn't have its own UI and it just holds the summarised data and then our production UI fetches data from it to show.
I think initially we should go for aggregate data and performance data in same place and then eventually make it separate as required.
For caching, I need to investigate some of the solutions as it would help a lot if we do not call storing data code each time a new message is being sent out. They should be cached and then we send them out on nightly basis to reporting server.
If we want to keep going with our current KiboDash then I think design document is ready to implement and we can just open tasks and start implementing the performance reporting work.
I did more work on this yesterday and tried running both redis and memcached systems for doing caching. I found redis to be easier and simple to setup and use than memcached. Also, the nodejs client libraries for redis are very easy to setup use and understand.
https://linuxhint.com/nodejs_redis/
https://github.com/NodeRedis/node_redis
I think we should go for redis for caching. For UI graphs as discussed in the design document we would go with one of the dashboard BI tools. I am investigating them today.
I did a lot of investigation in UI tools for dashboards discussed in document on last Saturday and tried to run them myself. Most of them were very complex and had high learning curve. However, I found one tool which is open source and simple and also supports fetching data from http server on intervals.
https://github.com/Freeboard/freeboard
With this, we would just write the server logic to save summarised performance data in any database and then this UI will fetch data from it. Basically, this is complete UI solution and no server is provided.
With this, we would use our caching work discussed above in combination. Our cache would store performance data and the script on performance dashboard server would periodically fetch data from there and put in its database. Freeboard UI would show the graphs and charts from that performance database.
I think we have found the right tooling with combination of Redis and freeboard.
As a next step, I would simply install redis on our kiboengage server and start putting performance data in it. After this, I would go and install freeboard on new server. We won't open milestone for this as this single task would be sufficient to carry on all the work for this feature.
This is under construction.
@sojharo Please post status from Friday
On Friday, I installed redis server on both kiboengage staging and production droplets and also did the code to record performance values in it for the keys. I also did some test values on redis cache as well. I expire them after 24 hours so before 24 hours our performance management system should fetch the performance data for the day.
The next step in this is to buy and setup the droplet for performance management system where I will install the open source dashboard UI freeboard which we decided after looking into other options. Besides this UI, we would also setup SQL or mongodb to store this summary data for let's say 2 days initially.
Links for reference:
https://linuxhint.com/nodejs_redis/ https://github.com/NodeRedis/node_redis https://redis.io/commands/keys
I am doing some more tests today as well on staging.
Let's do SQL instead of Mango.
Can you please draw an architecture diagram on how this will look like
Also, update the design document
Yes, I will create diagram today as well while setting up the droplet.
This is under construction. I was having some issues in SQL installation, I completed it now. Today, I would do dashboard and server setup on this and also create a diagram and update the document.
I worked further on this on Tuesday and had to spend much time on this as dashboard open source UI was not starting after settnig up. However, I was able to complete it and now we need to do the server side work to fetch data from our cache on 24 hour basis.
@sojharo What is the update here?
I could not give time to this issue on Friday due to issues #6420 and #6396
Salam, I had spent most of my time on this issue yesterday as it was crashing due to wrong version of nodejs, I was able to solve it and was able to serve the dasbhoard:
However, the dashboard was shown empty and was not fetching the data. I tried several combinations of solution but it didn't work. I will try again on this today. Under construction
I completed the work on this yesterday. I am doing some testing on 24 hours data. Faizan also test this.
@sojharo kindly help me to test it
i have tested it is working fine but the UI should be more simple here. i have discussed with @sojharo regarding this that there should be like graphs and all so that it will be easy to understand that.
I started work on this and looked into other open source user interface solutions for graphs. I am experimenting with dashbuilder now. I set it up and localhost and trying to fetch information in it. If it works then we will start using that.
what is the update here?
I could not work on this tasks due to lots of work on other tasks these days. I am continuing this today as I don't have other urgent tasks.
I was successfully able to run dashbuilder server. It also has the line graphs which will show data as we were able to see in digital ocean reports. However, it requires some work to transform the data in the structure that software wants, I will open few tasks in this.
The dashbuilder was set to auto seed the database so it showed the sample data when I started today. Here is how it looks on my localhost:
As we might not be able to do demo today so I am putting these screenshots here. I will fix the seed data issue today.
I was able to successfully remove the seeding and also did some transformation of data on dashboard builder on our own data. Here is what reports look like now:
For broadcasts
For Surveys
For Polls
For testing purpose, I am fetching and transforming data of staging. Once successful, we will import production database as well into this. On staging, we have very few data.
As discussed, I will be opening following tasks for this feature:
This is how production database is being shown in performance dashboard.
Screenshot 1
Screenshot 2
For other tasks in this, I have opened separate issues #6806 #6807 and #6808
I will be updating further work in these issues and we can close this issue after review.
This is a Design task to implement performance management and dashboard.
Currently, we look into Digital Ocean it shows different usage graph for CPU, Memory, Bandwidth, I/O..... These are shown in the following way last 6 Hrs, 24 Hrs, 7 days and 30 Days.
So when we have high CPU utilization, we cannot correlate to the number of transactions we are processing.
We need to capture the following message count for:
Broadcast Broadcast API Poll Survey AutoPosting AutoPosting FB Post Invite using phone numbers.
We need to show in the graph the number of messages in: last 6 hrs last 24 hrs last 7 days last 30 days
The design should be extensible. However, in phase we need to only implement for:
Broadcast Broadcast API AutoPosting
With intervals of: last 6hrs last 24 hrs
In Phase 1 we do not need to keep the data for more than 24 hrs.
The design should include both the logic and UI (please keep it close to Digital Ocean).
The task intends to determine and correlate CPU or Memory usage with transactions.
The intent is not for historical trending like https://github.com/Cloudkibo/KiboPush/issues/6259