JayCesar commented 4 months ago

The course: I'll do gain Datadog skills while monitoring a real e-commerce application that we’ve already built and deployed for you in a virtual machine.

Overview of course topics

Universal Service Monitoring (USM) provides a comprehensive overview of service health metrics across your technology stack without requiring you to instrument your code.

Logs capture event streams from various components of your infrastructure. Datadog Log Management enables you to cost-effectively collect, process, archive, explore, and monitor all your logs.

Metrics can track a wide range of measurements—such as latencies, error rates, or even user signups—within your environment over time. Monitors actively check on these metrics and alert you when critical changes occur, for example, when a threshold indicating a problem is crossed. You can use either metrics or monitors as the basis for service level objectives (SLOs), which define targets for performance and provide a framework for establishing clear standards for service quality.

Integrations are plugins or add-ons that enable Datadog to monitor individual third-party software, services, or tools. With the help of integrations, Datadog can unify different metrics and logs generated by many technologies deployed across your infrastructure.

Dashboards allow you to view curated visualizations of key observability data on a single page. You can create custom dashboards from scratch, but there are also many out-of-the-box (OOTB) or pre-built dashboards available.

JayCesar commented 4 months ago

Universal Service Monitoring (USM)

Universal Service Monitoring (USM) offers a view of service health metrics across your entire technology stack, without the need to instrument your code. Instead, it relies on the presence of a configured Datadog Agent and Unified Service Tagging to collect data about your existing services, which can then be viewed through the Service Catalog.

Unified Service Tagging

Universal Service Monitoring is capable of identifying services through commonly used container tags (such as app, short_image, and container_name), and automatically generates corresponding entries in the Service Catalog.

Once these services are discovered, Datadog enables you to access request, error, and duration metrics for both inbound and outbound traffic. These service health metrics are helpful in setting up alerts, tracking deployments, and establishing service level objectives (SLOs), providing you with a comprehensive view of all the services running on your infrastructure.

Service Catalog

JayCesar commented 4 months ago

Notes

I can export my log exploration as a saved view; To dig deeper into individual logs, you can examine the Log Side Panel. The upper part of the panel displays general context information, while the lower part displays the actual content of the log.

In the following hands-on activity, you will:

[ ] search, filter, and query logs
[ ] read and understand log details
[ ] create a custom facet from a log attribute
[ ] create a saved view based on a search query
[ ] visualize field aggregations
[ ] export a timeseries graph to a dashboard

JayCesar commented 4 months ago

By using Live Tail I will have access to all application system in real time

Information is gathered from tags. Tags may be automatically attached (host, container_name, etc.) or added through custom tags (team, env, etc.) by the Datadog Agent or Log Forwarder.

JayCesar commented 4 months ago

Metrics

Metrics are the smallest unit in the Datadog universe but they grant enormous insight into your infrastructure when they are visualized, measured, and monitored. Metrics are numerical measurements about any aspect of your system over a period of time, such as latency, error rates, or user registrations. In Datadog, metric data is received and retained as data points that include a value and timestamp.

Service Level Objectives track metrics over long periods of time to help you define quality standards.

Important: Datadog has its own agents that helps me with metrics, logs and everything.

I can create Service Level Objectives by using monitors then achieve a SLA ;

Configuration allows you to identify metrics with tag configurations or additional percentile aggregations.
Percentiles shows you which distributions have percentiles enabled.
Metric Type helps you identify distributions and non-distributions (counts, gauges, and rates).
Distribution Metric Origin quickly identifies which Datadog component the distribution metrics have originated from.

Under Metric Type, click on Distributions to only show the metrics that are of the distribution type. Distributions provide enhanced query functionality and configuration options that aren’t offered with other metric types.

Under Distribution Metric Origin, notice the different Datadog components the metrics originated from.

JayCesar commented 4 months ago

Meetric Types

Count: Adds up the values received within a specified time interval. For example, 2000 HTTP requests.

Rate: Divides the count by the duration of the time interval. Using the same example mentioned above, 0.566 HTTP requests per second.

Gauge: Reports the last value received during the specific time interval. This metric type would be appropriate for monitoring the usage of RAM or CPU, since the last value gives an accurate representation of the host’s behavior during the timeframe: 2097152 bytes of RAM.

Histogram: Summarizes the submitted values into five different values: the mean, count, median, 95th percentile, and maximum. This generates five distinct timeseries. For example, this metric type is useful for measuring latency, where it is inadequate to only know the average value. Histograms enable you to understand how the data is distributed without recording every single data point.

Distribution: Summarizes the values submitted within a time interval across all the hosts in your environment. Distributions provide enhanced query functionality and configuration options that aren’t offered with other metric types.

JayCesar commented 4 months ago

Introduction to Integrations

The Datadog Agent is software that runs on your hosts. It collects process- level events and metrics and sends them to Datadog, where you can analyze your monitoring and performance data. For this course, the Datadog Agent has already been installed for you in all the labs.

So the datadog agent works in my host, collect the data then send them to datadog so that I analyze the data

There are three main types of integrations: Agent-based, authentication-based, and library. You can even build your own integration!

- Agent-based integrations are installed with the Datadog Agent (on your host or in containers) and use a Python class method called check to define the metrics to collect.

- Authentication (crawler) based integrations are set up in Datadog where you provide credentials to obtain metrics and data from APIs. These include popular integrations like Slack, AWS, Azure, and PagerDuty.

Library integrations use the Datadog API to allow you to monitor applications based on the language they’re written in, like Node.js or Python.

JayCesar commented 4 months ago

System checks

Check status

docker compose exec datadog agent status

docker compose exec datadog agent status | grep '^\s*disk' -A 11

Run a check


docker compose exec datadog agent check disk

JayCesar commented 4 months ago

Authentication and Library Integrations

Datadog's authentication-based integrations connect to third-party platforms to collect metrics, logs, and events. Generally, they will either pull data from these platforms on your behalf or authorize those platforms to push data to Datadog.

Library integrations, often referred to as client libraries, are software packages that you import into your application code. They use Datadog's tracing API to collect performance, profiling, and debugging metrics from your applications at runtime.

JayCesar commented 4 months ago

Introduction to Dashboards

With Dashboards, you can easily track and monitor critical metrics that are crucial to the health of your system.

The grid-based dashboards are commonly used as status boards or storytelling views. These update in real-time and can represent fixed points in the past.

A Screenboard layout is similar except that it’s free-form instead of a grid layout.

Timeboards provide an automatic layout and represent a single moment in time, whether fixed or real-time, for the entire dashboard. Those are often used for troubleshooting, correlation analysis, and general exploration of data

JayCesar commented 4 months ago

Resources

While this course covered a lot of material, in reality, you’ve only scratched the surface of what you can achieve with Datadog. To continue your learning, you’re encouraged to take other courses in the Datadog Learning Center that dive further into each topic. Some of these courses may be an extension of what you’ve learned here while others will introduce you to completely different aspects of Datadog. Here are some recommended courses you can take:

The Datadog Documentation site is also an excellent reference for day-to-day Datadog use and to discover new features of familiar products. For more information about the topics covered in this course, you can read the following docs:

The Datadog GitHub account contains repositories of code for integrations and more.

The Agent repository
The Agent core integrations repository

Lastly, you can follow Datadog’s Blog and YouTube channel for the latest news!

JayCesar / cloud

Datadog Foundation #2

Overview of course topics

Universal Service Monitoring (USM)

Unified Service Tagging

Service Catalog

Notes

Metrics

Meetric Types

Introduction to Integrations

System checks

Check status

Run a check

Authentication and Library Integrations

Introduction to Dashboards

Resources