HowFast / roadmap

Open roadmap for HowFast
https://www.howfast.tech/
4 stars 0 forks source link

Weekly reports on performance #2

Open MickaelBergem opened 6 years ago

MickaelBergem commented 6 years ago

Need: to get a periodic overview of the general performance and/or major events on a given set of monitors.

This may include:

Display channels:

Any comment from the community is welcome! :mega:

MickaelBergem commented 4 years ago

Here is the draft visual of this weekly/monthly report. @jpcaruana @pascalandy would the information in this report be useful to you? (you had upvoted this feature) image

Now is the best time to add / remove / improve the information that goes into this report :)

pascalandy commented 4 years ago

I would add titles like:

Weekly stats

bla bla ...

monthly stats

bla bla ...

pascalandy commented 4 years ago

Not sure the green and red little triangles add value.

jpcaruana commented 4 years ago

I like it (I like green/red triangles : if everything is green, I don't have to read)

It would be great if you could also had a list of worst routes (from the APM part), the top 5 impact

jpcaruana commented 4 years ago

as a reference, I like the weekly email by sentry:

image

it gives me a good sense of "everything is good/bad" at a glance, and I can dvelve into details (not shown in my screenshot)

MickaelBergem commented 4 years ago

Thank you both for the quick answer!

I would add titles like:

Weekly stats

bla bla ...

monthly stats

bla bla ...

@pascalandy Just to make sure I understand what you mean, you would like to receive the weekly+monthly report every week? So that you can somehow have some longer-term stats on the performance?

@jpcaruana thanks! The Sentry report is indeed quite useful. Adding APM data in the performance report will come next, I can definitely see the value.

pascalandy commented 4 years ago

Correct, it's a two for one :-P

you would like to receive the weekly+monthly report every week

MickaelBergem commented 4 years ago

Update: I've been thinking a bit about how to best design this email (in terms of content more than in terms of UI), and here are my thoughts.

User experience

As a user of HowFast, I have limited time, so I will just trash the email if I don't care. To make it easier for the user to know if they should care or not, the email subject has cues such as:

If all monitors are up, but there were incidents in the past week, I'm still unsure what is the most useful. We could go with the total number of minutes spent down (45 minutes of cumulated downtime this week), or the maximum (longest incident lasted 6h), or something else. The cumulated downtime becomes much less interesting as soon as you have 3+ monitors: maybe all three went down at the same time and you end up with a 2h15 downtime while it only went down for 45 minutes.

Based on this information, the user can decide to archive/trash the email (especially if there were no incidents), or to read it.

If the user only has to spend 10 seconds scanning through the email, what are the most important metrics? See "head metrics" below. I'm not sure about this part for now.

image

The current design highlights the monitors that are currently down (the most important information) along with a short explanation of what is happening. If the user needs more information, the rows are clickable and open the monitor in HowFast.

Head metrics

This is the "Sentry-like" report. I'm in favor of adding those big numbers at the top of the email to provide a synthetic view of what happened, but I need to figure out the details. What makes sense in the context of HowFast?

For now I only think about the number of monitors currently down, the number of incidents last week, and maybe the slowest average response time. What are the metrics you would be interested to see?

Weekly AND monthly metrics in a single mail

Given the number of monitors for some of the teams using HowFast, having two tables will make the email super long and harder to read, so I'm not convinced this will add value. I will study the possibility of adding an extra column in the report "uptime over the last 30 days", while making it clear the other one is the "uptime over the last 7 days". Those numbers might as well be easier to show directly inside HowFast instead of in an email.

Next up

Currently the implementation is almost ready, and will be rolled in in the next few days. If you are interested, you can opt-in and start receiving the reports in your mailbox, so that you can see what it will look like with your numbers - I would love to hear your feedback!

MickaelBergem commented 4 years ago

The first batch of weekly reports were sent this Monday, with some very good results:

Overall, several teams were able to get more value out of HowFast thanks to this report.

The next batch will get extra data included, related to certificates expiring soon (in less than two weeks). This will help make sure that even if no notification is configured for the affected monitors, the team can still learn about it.

Feel free to share your feedback :)

jpcaruana commented 4 years ago

Hi,

I like these emails and I'm looking forward to seeing it becoming better and better.

Would it be possible to be able to choose the order of monitors inn the email ? I have a lot of monitors, and production monitors are my main focus (the rest is more informational for me) for this kind of weekly digests.

Thanks!

MickaelBergem commented 4 years ago

Thank you for your feedback @jpcaruana! Currently, the monitors are ordered by:

  1. status (monitors that are down first)
  2. increasing uptime (so that you can see the problematic monitors first)
  3. decreasing response time (if all your monitors have 100%, you probably want to focus on the slower ones first)

I'm trying to think about a way to make it work for you in this context. I assume we could somehow add a flag for "production" monitors, and display those first, do you see another way to make it work in your case? I will try to think about it.

jpcaruana commented 4 years ago

I assume we could somehow add a flag for "production" monitors, and display those first, do you see another way to make it work in your case?

this seems like a perfect use solution for my use case. You could also use this flag for the web UI too i guess

MickaelBergem commented 4 years ago

@jpcaruana you mentioned having the most impactful endpoints listed in the email, would that work if it's based off all the APMs in your team, including the non-prod ones, or would the result be significantly useless? I started working on this and might very well be able to send you the results for your team so that you can double check, but maybe you already know.

MickaelBergem commented 4 years ago

Here is a draft of the APM summary:

v0.1 image

v0.2 application-performance-monitoring-howfast

I think the impact measured in ms per minute makes sense (=milliseconds a worker is spending working on this endpoint during an average minute) and adds value.

jpcaruana commented 3 years ago

current weekly report works great: I think you can close here @MickaelBergem :)