Joystream / youtube-synch

YouTube Synchronization
11 stars 10 forks source link

Implement performance monitoring for YT-synch service #170

Closed zeeshanakram3 closed 1 year ago

zeeshanakram3 commented 1 year ago

As stated

like how many videos are in pipeline, time it took to synch frist video for last 10 creators. Basically how we monitor the health of that system, based upon which we could select what enhancemnts to performance or otherwise we need to do in ypp.

bedeho commented 1 year ago

See remark https://github.com/Joystream/youtube-synch/issues/169#issuecomment-1492968165

zeeshanakram3 commented 1 year ago

I did a POC integration of YT-synch with Elastic search logging stack with the following setup:

After the POC, all the logging data was indexed in Elasticsearch & was available in the Kibanna dashboard to execute queries OR create visualizations. For the Youtube-synch service, I divided the logging data that would be sent to ES into two groups: Monitoring data (for performance tracking) & Alering data (in case of service errors/exceptions)

Monitoring

The Youtube-synch application is broken down into 3 services, i.e. ContentDownloadService, ContentCreationService, and ContentUploadService that are independently running and processing content that concerns them. So based on the application design, each service logs the content that they are processing.

Each Logs entry contains the videoId (ID of the youtube video) + channelId (The Joystream/YT channel Id the video belongs to)+ the action that the service was doing/executing + action timestamp

The following events will be logged & sent to Elasticsearch:

The raw events grouped by each service could identify the performance/bottleneck of each service and hence could help in
selecting what enhancements to perform. Elastic search provides domain-specific query language (DSL) to execute queries on indexed logged data, so I think these raw event data points are generic enough and can be used to compute complex queries or create dashboards

Let me know if these logging events are sufficient or if we want to add more fields to the events

Alerting

The following error events will be logged & sent to Elasticsearch to create the alerts based on the logs

These events will also include the ID of the youtube video (videoId) that was being processed when the errors occurred along with the timestamp of the log entry.

bedeho commented 1 year ago

Fantastic work! I think this is more than enough to start to use, and then we let real world problems guide any further enhancements if needed. Some questions

  1. How do you propose we distribute all of this tooling to yt-synch operators in a way where it is both very easy to get this setup running?
  2. What remains to go from POC to us using this in production?
  3. I believe there are plugins or tools that very easily allow for triggering external message pushing based on certain conditions being satisfied in the elastic search database, did you look at that? Chatgpt tells me Watcher, Elasticsearch Alerting, Grafana and ElastAlert may be possible alternatives. Many of these seem to know how to deliver message all the way to final destination, like Slack, Telegram, email, etc.
zeeshanakram3 commented 1 year ago
  1. How do you propose we distribute all of this tooling to yt-synch operators in a way where it is both very easy to get this setup running?

First, If we assume that Elasticsearch infra is already setup then yt-synch operators only need to provide endpoint & the credentials for the ES instance, the yt-service would be sending data to for the indexing. Now, there are two ways yt-synch operators can setup the Elasticsearch infra, they can either use fully-managed Elasticsearch Cloud (it's easy to configure for non-technical operators), Or they can opt in for on-premises self-managed Elasticsearch & Kibanna instances. For the latter we can prepare the docker/docker-compose setup so that its minimum work as far as deployment & configuration of self-hosted Elastic stack is concerned

  1. What remains to go from POC to us using this in production?
  1. I believe there are plugins or tools that very easily allow for triggering external message pushing based on certain conditions being satisfied in the elastic search database

Yes, there are rules based aletrting options available, that you can setup to push the messages to configured destinations when the specific conditions are met. And you can setup these alerts from the Kibanna dashboard.

image