Introduction to Application Performance Monitoring

Intro

Modern applications are distributed systems composed of numerous services that handle high volumes of requests to the application. Oftentimes, multiple services are involved in handling a request, and when a request fails or is handled poorly, it’s difficult to pinpoint the root cause. Distributed tracing is a method used to track how a request is handled by a distributed system. Datadog Application Performance Monitoring (APM) employs distributed tracing to optimize monitoring application performance and provides a variety of features so that you can investigate and troubleshoot at all levels of your applications.

Distributed Traces and Spans

In distributed tracing, when a request is made to an application, a trace is created that collects performance data from each service involved in processing the request. The trace is made up of spans. Each span represents an operation or logical unit of work performed by a service in response to the request. Services that perform more than one operation to handle the request produce a span for each operation.

With Datadog APM, you can solve the following issues and more:

Find traces matching a bug report for a customer, status code, and endpoint
Live view your application trace data to gain performance insights at a glance
Identify the slowest SQL queries on a database for a specific host and shard
View slow and cold-starting serverless functions

Datadog APM employs distributed tracing to optimize your application performance monitoring. Distributed tracing allows you to track the progress and status of a request as it is handled by your application’s distributed system. You can also live query all traces and only keep the ones you need.

When we talk about "traces," we're looking at how activities happen in a system. Imagine these activities are shown visually as flame graphs, which are like maps of what happens when a request is made to a specific part of a web application.

In this case, let's focus on a specific example: a request made to update a product in an online store. The graph starts at the moment the request reaches the endpoint where products are managed. This starting point is shown as the top part of the graph.

From there, the graph shows two things that happen next: one is a database query to get or change data stored in a MongoDB database (this is the "web-store-mongo" service). The other thing happening is another operation related to updating the product in the store's web application itself (specifically managed by the "Admin::ProductsController#update" endpoint).

So, this graph helps us see how these different parts of the system work together. It shows dependencies, like how the web application relies on the database to manage and update product information.

The length of each span represents the duration of each operation involved in completing the request. Each span has a name that describes the operation, a start time, a duration, and span tags.

Span tags are labels made up of key pairs that provide detailed information about different parts of a system, such as its infrastructure, applications, and business aspects. In Datadog, these tags can be added automatically by the system or manually by users.

- Automatically assigned tags: This happens through integrations, instrumentation, or inheritance from other parts of the system. - Manually assigned tags:

Users can add custom tags to provide specific information. Tags can be specific to certain parts of the system (spans) or apply to everything within a service or application. For instance, in an ecommerce application, you might use tags like "customer" and "merchant" to label and organize the data related to those entities.

Traces in Datadog

In Datadog APM, you can search your application traces in Traces Search and Live Search, and you can view the details of traces in the Trace View.

Traces Search - Traces Search allows you to search the traces that have been sampled and indexed by the Datadog backend based on the span tags. Datadog employs a trace sampling strategy that keeps the traces that matter most, including the following:

Distributed traces

Traces from low queries-per-second (qps) services

Traces corresponding to bad user experiences such as errors

Traces corresponding to high latencies or failed requests

The variety of traces representative of the distributed system

Datadog APM employs distributed tracing and offers a variety of features to provide you with an in-depth look into application performance.

Traces Search lists sampled traces that have been indexed and retained so that you can drill down on trace data that is representative of application performance.
Live Search lists all trace spans ingested by Datadog in near real time.
Trace View visualizes each trace as a Flame Graph with a breakdown of the data related to each span—that is, each operation performed by each service involved in handling a request.
Service Catalog lists the services instrumented for APM in an application environment.
Service Page provides a breakdown of trace statistics (Requests, Latency, Errors, and Time Spent), resources (specifically, endpoints for web services and queries for databases), and any runtime metrics for a service.
Resource Page provides a breakdown of trace statistics, span summary, and traces for a resource.
Service Map visualizes the organization of services and service dependencies in the app architecture.
APM Monitors are monitors designed specifically for application monitoring and can be linked to Service Map, Services, and related Service and Resources Pages using monitor tags to keep you alerted on key performance metrics.
Before monitoring application performance, you need to instrument your application for Datadog APM. Datadog provides tracing libraries for a variety of languages: Java, Python, Ruby, C++, Go, Node.js, .NET, .NET Core, PHP, Envoy, Nginx, and Istio.

Go deeper on all of the topics you learned about in this course! Explore the Learning Center catalog for more self-guided learning on these products and more.

The Datadog Documentation site is an excellent reference for day-to-day Datadog use. Use it to discover new features of familiar products. To learn more about Datadog APM, you can view APM and Distributed Tracing in the Datadog documentation. You can also search for articles related to APM and tracing in the Datadog blog.

If you’re more of a visual learner, you may appreciate the regular video content posted to Datadog’s YouTube page.

Mark Lesson Complete & Continue

JayCesar / cloud