Closed nappa85 closed 1 year ago
Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines.
@srikanthccv Would you have any insights on this?
@nappa85 Please provide a fully reproducible example. I tried a basic OTLP example with an HTTP counter and could see the metric name http_hits
.
You can find a MVP here: https://github.com/nappa85/otlp-test/ Start the service and then call on port 3000 to populate metrics
@nappa85 the 2nd image is not from SigNoz's UI? How are you visualizing the data?
@nappa85 the 2nd image is not from SigNoz's UI? How are you visualizing the data?
Second image is from another product ingesting the very same OTLP data (I've an otlp-collector exporting to different products to make a comparison)
Ah, I see you are using delta temporality in the code. We currently support the cumulative to remain compatible with Prometheus, its query language and ecosystem. So if it's possible, I would suggest you use cumulative (which is also default in OTEL). Otherwise, there needs to be another intermediate step which converts delta points to cumulative making it stateful, which makes it difficult to scale horizontally.
I've ended up using Delta after this issue https://github.com/open-telemetry/opentelemetry-rust/issues/677 I'll try with other configs and let you know
I went with default values, so aggregator selector exact and export kind stateless, and I have the same behavior
Adding some infos: With export_kind cumulative it works, but cumulates values over time. In this dashboard we can see http.hits going up in time, but the request rate was constant. At half chart I switched back to export_kind stateless and the value disappeared
In this dashboard we can see http.hits going up in time, but the request rate was constant
Is it constant? or is it not big enough to tell the difference in the same chart because y axis is skewed by noop line?
At half chart I switched back to export_kind stateless and the value disappeared
What is export kind stateless?
In this dashboard we can see http.hits going up in time, but the request rate was constant
Is it constant? or is it not big enough to tell the difference in the same chart because y axis is skewed by noop line?
I'm sending a request every 2 seconds, so it's constant
At half chart I switched back to export_kind stateless and the value disappeared
What is export kind stateless?
It's an opentelemetry crate concept, I thought it was standard also outside rust
Just to be fully transparent: I've made an MVP with opentelemetry 0.18.0, latest version, I wasn't using it because it's a big breaking change from 0.17.0 and it lacks documentation about metrics. You can find the code here https://github.com/nappa85/otlp-test/tree/v0.18.0 It fires every possible metric type (u64/f64)((observable)counter|histogram|observablegauge|(observable)up_down_counter), and for every metric type it aggregates 3 different ways: sum, last_value and histogram In total are 30 different metrics sent, all the integer metrics sends a constant 1 value, all the float metrics sends the same random value between 0 an 1 for every call The export kind is stateless, the default, so I'm no more using delta that you don't support (it was a limitation of 0.17.0)
I've configured SigNoz and another product doing a graph for every metric, grouping the 3 different aggregations.
Data generation is made calling my program with curl
using watch
, that makes a call every 2 seconds.
watch curl 127.0.0.1:3000/a
In signoz I find histogram metrics split in 3 voices: _bucket, _count and _sum, so there are 5 lines per graph All metrics in SigNoz appears monotonic growing, even if I specified NOOP for all values, except for the observable_up_down_counter, here I really can't say what's going on.
With the exact same data, the other product gives a correct output
I don't know if I'm doing something wrong with SigNoz, let me know
Just to be clear, someone pointed out it seems I'm telling that SigNoz doesn't works. No, I'm just pointing there are unclear/unfriendly things, I wrote this report at the end of my work day and I'm quite exhausted, so maybe I haven't choosen the best words... As a developer, I think this kind of feedback are really helpful.
@nappa85 I am not sure I really understand what you are trying to convey. We appreciate any and all feedback. Our underlying data model closely follows OpenMetrics (~Prometheus) exposition format. So just using NOOP gives you the raw data, which is not very helpful. You need to change the aggregate operator depending on the metric type (usually RATE or SUM_RATE for counters, combined with histogram_quantile for histogram types etc.). Please go through these docs https://signoz.io/docs/userguide/create-a-custom-query/ and try to use the appropriate operator to plot the graphs. You also mentioned some rust SDK-specific things I am not fully aware of. If you are unsure about something and need help, please join our Slack channel and ask questions https://signoz.io/docs/community/#slack.
An example of what I'm trying to express:
From my application I produce this histogram, it's variable data, response timing, I need to see something like the average time, not only the last value for the time frame. I'm producing a random float value between 0 and 1, so, with standard distribution, the average will be near 0.5, and sending the exact same data to both products.
With the other product, I simply put the histogram data in the graph, it forces me to use an aggregation and by default uses p50, the result is already acceptable
With SigNoz I find 3 different possible metrics: histogram_bucket, histogram_count and histogram_sum. I'm discarding count and sum becaure they aren't what I want, so, NOOP is obviously wrong, but if I plot it it's already a warning, why all values are growing like it's a count? Even with P50 I have a growind graph With RATE I have all the single values distinct Strangely they are all under 0.5, but if I take a look to histogram_last, I see I have many values over 0.5 Maybe I'm using the wrong aggregator? SUM_RATE goes too high RATE_SUM even higher RATE_AVG suspiciously always under 0.5
I can accept I'm using SigNoz the wrong way, I'm testing it since few days and I'm neither an expert in metrics, but it doesn't seem to be user friendly
I think I understand the challenge you are facing, especially since you don't have prior experience prometheus or primal. You were expecting the aggregation can be directly based on the metric type instead of you trying to figure out what to do with the underlying raw data.
I think I understand the challenge you are facing, especially since you don't have prior experience prometheus or primal. You were expecting the aggregation can be directly based on the metric type instead of you trying to figure out what to do with the underlying raw data.
Can you elaborate more? Being the data the same, how can I reach the same result i get with the other product using SigNoz?
I'm testing SigNoz 0.11.2 using docker-compose clickhouse environment (minus hotrod containers), along with other products, firing metrics to a single otel-collector instance with different exporters.
In my Rust code I've got those metrics:
On SigNoz I can find
mysql.duration
(calledmysql_duration
) andservice.duration
(calledservice_duration
) metrics, but I can't findhttp.hits
metrics, while on other products I can find all of them.Is there some kind of filter on metric names? Am I doing something wrong?