JayCesar / cloud

0 stars 0 forks source link

Introduction to Application Performance Monitoring #4

Open JayCesar opened 1 month ago

JayCesar commented 1 month ago

Intro

Modern applications are distributed systems composed of numerous services that handle high volumes of requests to the application. Oftentimes, multiple services are involved in handling a request, and when a request fails or is handled poorly, it’s difficult to pinpoint the root cause. Distributed tracing is a method used to track how a request is handled by a distributed system. Datadog Application Performance Monitoring (APM) employs distributed tracing to optimize monitoring application performance and provides a variety of features so that you can investigate and troubleshoot at all levels of your applications.

Distributed Traces and Spans

In distributed tracing, when a request is made to an application, a trace is created that collects performance data from each service involved in processing the request. The trace is made up of spans. Each span represents an operation or logical unit of work performed by a service in response to the request. Services that perform more than one operation to handle the request produce a span for each operation.

JayCesar commented 1 month ago

With Datadog APM, you can solve the following issues and more:

  1. Find traces matching a bug report for a customer, status code, and endpoint
  2. Live view your application trace data to gain performance insights at a glance
  3. Identify the slowest SQL queries on a database for a specific host and shard
  4. View slow and cold-starting serverless functions

Datadog APM employs distributed tracing to optimize your application performance monitoring. Distributed tracing allows you to track the progress and status of a request as it is handled by your application’s distributed system. You can also live query all traces and only keep the ones you need.

When we talk about "traces," we're looking at how activities happen in a system. Imagine these activities are shown visually as flame graphs, which are like maps of what happens when a request is made to a specific part of a web application.

In this case, let's focus on a specific example: a request made to update a product in an online store. The graph starts at the moment the request reaches the endpoint where products are managed. This starting point is shown as the top part of the graph.

From there, the graph shows two things that happen next: one is a database query to get or change data stored in a MongoDB database (this is the "web-store-mongo" service). The other thing happening is another operation related to updating the product in the store's web application itself (specifically managed by the "Admin::ProductsController#update" endpoint).

So, this graph helps us see how these different parts of the system work together. It shows dependencies, like how the web application relies on the database to manage and update product information.

The length of each span represents the duration of each operation involved in completing the request. Each span has a name that describes the operation, a start time, a duration, and span tags.

Span tags are labels made up of key pairs that provide detailed information about different parts of a system, such as its infrastructure, applications, and business aspects. In Datadog, these tags can be added automatically by the system or manually by users.

- Automatically assigned tags: This happens through integrations, instrumentation, or inheritance from other parts of the system. - Manually assigned tags:

Users can add custom tags to provide specific information. Tags can be specific to certain parts of the system (spans) or apply to everything within a service or application. For instance, in an ecommerce application, you might use tags like "customer" and "merchant" to label and organize the data related to those entities.

JayCesar commented 1 month ago

Traces in Datadog

In Datadog APM, you can search your application traces in Traces Search and Live Search, and you can view the details of traces in the Trace View.

Traces Search - Traces Search allows you to search the traces that have been sampled and indexed by the Datadog backend based on the span tags. Datadog employs a trace sampling strategy that keeps the traces that matter most, including the following:

Distributed traces

  1. Traces from low queries-per-second (qps) services
  2. Traces corresponding to bad user experiences such as errors
  3. Traces corresponding to high latencies or failed requests
  4. The variety of traces representative of the distributed system
JayCesar commented 1 month ago

Datadog APM employs distributed tracing and offers a variety of features to provide you with an in-depth look into application performance.

Go deeper on all of the topics you learned about in this course! Explore the Learning Center catalog for more self-guided learning on these products and more.

The Datadog Documentation site is an excellent reference for day-to-day Datadog use. Use it to discover new features of familiar products. To learn more about Datadog APM, you can view APM and Distributed Tracing in the Datadog documentation. You can also search for articles related to APM and tracing in the Datadog blog.

If you’re more of a visual learner, you may appreciate the regular video content posted to Datadog’s YouTube page.

Mark Lesson Complete & Continue