SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
18.95k stars 1.24k forks source link

How to correctly display LongCounter metric in dashboard? #3240

Open quintonn opened 1 year ago

quintonn commented 1 year ago

Bug description

I can't seem to show the correct values for a LongCounter in the SigNoz dashboards.

I have set up a Java application to report on the number of requests it processes.
This works pretty well.

The problem happens if my Java application restarts for any reason.
Then the counter restarts from zero.

I asked this StackOverflow question: https://stackoverflow.com/questions/76788903/which-metric-to-use-for-open-telemetry-for-basic-counter.

And it seems the LongCounter is the correct metric and it should be up to the tool I use to report it correctly.

Surely a resetting device is a normal scenario?

The default dashboards don't seem to offer a solution.

I am also trying a raw ClickHouse query.
I found the metric I am testing in the signoz_metrics.samples_v2 table.

This is the query I am now trying:

select max(value) from signoz_metrics.samples_v2 where metric_name = 'requestCount' AND fromUnixTimestamp64Milli(toInt64(timestamp_ms)) > (now() - INTERVAL 30 MINUTE)

But it will show the max value before my application reset and from that point on, I can't figure out what to do because the counter starts from 0 again.

Am I expected to store the latest count value locally in my application?

Any help will be much appreciated.

Update:
I was reading about cumulative and delta temporality, and when I try set this to "deltaPreferred", the metrics don't seem to show up in the signoz_metrics database at all!

Expected behavior

I expect to be able to see the increasing value of my counter.

How to reproduce

  1. Create a LongCounter and start adding values.
  2. Create a panel in SigNoz dashboard showing the metric.
  3. Restart the application sending the metric
  4. Notice that the dashboard doesn't show the correct count with combined count before and after the restart.

Version information

welcome[bot] commented 1 year ago

Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.

srikanthccv commented 1 year ago

When you work with Counter, it doesn't make sense to look at the absolute value but the rate of change. Please use one of the {SUM, AVG, MIN, MAX}_RATE operators.

quintonn commented 1 year ago

Hi @srikanthccv,

Thanks for the response. But could you expand on that?

If my counter is, for example, number of messages my system processes, how do I use it?

I have noticed that my app keeps sending the last known counter value periodically. And that value is being saved in the signoz metrics database.

I would like to be able to show to correct total number of messages my system has processed for all time, the total over the time interval selected in the dashboard and I would also like to have the number on a graph that shows when it has processed messages. i.e. at 12:00:00 it processed 12 messages, and at 12:02:00 it processed another 4 messages, etc.

I can't seem to get these done.

srikanthccv commented 1 year ago

If my counter is, for example, number of messages my system processes, how do I use it?

I have noticed that my app keeps sending the last known counter value periodically. And that value is being saved in the signoz metrics database.

Your app is sending a monotonically increasing counter value. It sends the number of messages processed so far since the process has started. For instance, assume that the process started at 12:00 and reports the value every 5 mins; it could look something like this. v(cumulative) is the value your app currently sends i.e message processed so far since the process began.

service_name t v(cumulative) v(delta)
my-service 12.00 0 0
my-service 12.05 12 12
my-service 12.10 14 2
my-service 12.15 16 2
my-service 12.20 22 6
my-service 12.25 25 3
my-service 12.30 32 7
my-service 12.35 37 5
my-service 12.40 45 8
my-service 12.45 49 4
my-service 12.50 59 10
my-service 12.55 66 7

Please use SUM_RATE from the operators' list

quintonn commented 1 year ago

Hi @srikanthccv,
I still can't seem to get it working.
I have made a test app. Code here: https://github.com/quintonn/otlp-counter-test

This what I'm trying:

  1. Call addCounter with value of 3
  2. Wait a couple of minutes, so the value is sent a few times.
  3. Call addCounter with value of 5
  4. Wait a few minutes
  5. In Signoz, the only way to get the correct count of 8 is using the NOOP metric with the REDUCE TO setting set to "Latest of values in timeframe". The SUM_RATE just shows 0
  6. Stop my app and re-run it.
  7. Call addCounter with value of 2
  8. Wait a few minutes
  9. The NOOP counter now shows 2, instead of 10, which is what I am trying to achieve.
  10. The SUM_RATE still shows 0.

I have added screenshots and the clickhouse query in the Notes folder in the repository linked above.

The way I've set my app up is in Eclipse I run it and then I can just type a number in the console and hit enter and it calls addCounter with that value.
This allows keeping the app running so the counter values are constantly sent to signoz.
To stop the app, I type "exit" and hit enter.

How did you get the table above?
Did you just make it in markdown or is it from clickhouse?
Because that is exactly what I'm trying to achieve.

Which tables in the clickhouse, or other, database are the counter metrics stored?
Because I am looking at signoz_metrics.distributed_samples_v2 and the only columns in this table are:

  1. metric_name
  2. fingerprint
  3. timestamp_ms
  4. value

And based on these values it is impossible for any system to know when a counter value has reset.
So either there is another table that's got more info or it's just not possible to track counters the way I want?
Where are the delta values stored?

sne11ius commented 3 weeks ago

Any news on this one?