JayCesar / cloud

0 stars 0 forks source link

Introduction to Observability #1

Open JayCesar opened 5 days ago

JayCesar commented 5 days ago

image image

What is Monitoring

Monitoring is the process of gathering data to understand whats's going inside of your infrastructure

image

What is Observability

Observability is taking the same data that you have collected and moving beyond "What is happening" to "Why is it happening"?

Monitoring is the tip of the iceberg. It means the if I want to see the whole picture, I need both.

image

image image image image

JayCesar commented 5 days ago

What is a Metric?

In a nutshell, it is a number in and a timestamp!

Metric are numerical values that can track anything about your envirorment over time, from latency to error rates to user signups.

image

In practice we need to collect a lot of them! in order to make sense!

image

This peak can indicate an anomaly.

Reasons to collect metrics


By using metrics I can spend less money, wake up less for on call, have less fire drills, happier costumers etc.

JayCesar commented 5 days ago

Metric Walk Through

image

JayCesar commented 5 days ago

What is Monitoring?

- Meaning: Monitoring is the act of paying attention to the patterns that your metrics are telling you. It's about analyzing your data and acting on it.

image

What do we Monitor?

- Performance: by watching performance we can match how our architecture and applications are using the resources that are available.

- Security: Is something going wrong in our environment? Creating monitors around security metrics can stop incidentes in their tracks.

- Usage: How application code is actually functioning

image

Whom do we alert?

image

threshold = a point at which something starts

image

- It's important to only alert team members when something actionable needs to be done.

JayCesar commented 5 days ago

Monitoring Walk Through

I can set up the metric from the dashboard:

image

Tip: think the alert as a software

And I can set up an e-mail to it:

image

JayCesar commented 5 days ago

What is a Log?

A log is usually a bukly piece or length of a cut or fallen tree.

image

Just kidding...

A log is a computer generated file that contains information regarding the usage of a system, This gives you insight into the behaviour of the resource.

image

📍It is a file filled with the history of that that computer / application / resource has been doing.

Why do we collect logs?

Practical uses for logs

_Obs: Computer troubleshooting is the process of diagnosing and solving computer errors or technical problems.

JayCesar commented 5 days ago

Storing our Logs

What kind of services generate Logs?

image

Everything done in cloud is tracked!

How long do we store Logs?

Thre are three guidances:

How do we consolidate our Logs?

image

Curiosity: (From ChatGPT) Datadog, a monitoring and analytics platform for cloud applications, is named to evoke the idea of a vigilant, loyal, and reliable guard dog. The name "Datadog" suggests the platform's role in keeping a watchful eye on data, ensuring that systems are running smoothly, and alerting users to any issues. Just as a guard dog is trusted to protect and notify its owner of potential problems, Datadog is designed to provide comprehensive monitoring and alerting for IT infrastructure and applications.

JayCesar commented 5 days ago

Logging Walk Through

image

image

It is important to know that Logs can not always tell me how to fix a possible problem, but they can alert me the problem I need to investigate

By using DataDog I can extract the context from a specific log

JayCesar commented 4 days ago

What is a Trace?

A trace is used to track the time spent by an application processing a request along with the execution path taken.

What is a Span

A Span is the individual unit of work that the code is doing.

image

Why do we collect Traces?

- Microservices: As businesses migrate away from Monolithic architecture, tracing is needed to figure out what all of the microservices ar up to.

- Optimization: Tracing allows you to optimize the performance of your applications by identifying bottlenecks in the calls being made.

- Troubleshooting: When something goes wrong, we need insight into the actual application code. This can assist us in tracking down errors with the code.

JayCesar commented 4 days ago

Tracing Walk Through

image

Datadog can tie traces, metrics and logs togheter!

JayCesar commented 4 days ago

Summary:

image

image