Closed cpuga closed 4 years ago
Thanks for point this out! I'm surprised no one else (including me) noticed, I'll work on fixing this
Hello everyone, I have the same problem on my Grafana. Any idea how I can fix this issue? Because I have double E-Mail count.
@RedJohn14 @seanthegeek Change the "min interval" to 1d to resolve. I'll be publishing a new dashboard soon, but trying to include some additional info and some panels to make some of it clearer. Have also changed the min interval to fix the issue where required.
Can get the fixed Grafana dashboard here: https://github.com/bhozar/grafana-dashboards/tree/master/parsedmarc
It requires Grafana 7.1, and has quite a few other changes.
I experience the same issue in Kibana when creating custom visualizations or using the Discover view. Would it be possible to split out begin and end dates into their own fields? While also leaving the date range field for those who need it. Then we can use begin date as the timestamp for the index pattern and avoid double counting regardless of interval.
Maybe an option that could be made available in the config file?
I found even with daily interval, since some reports are sent with a range greater than one day, double counting is still possible. This can be worked around in visualizations by using a sum of the message count field instead of count of all records, which would show a more accurate count of emails passing/failing anyway.
But the problem can be avoided entirely by using the start or end time of the range for the index pattern timestamp instead of a range with two values. I made the below changes to add the fields, then re-created my index pattern and used date_end as the timestamp field. Old reports would have to be re-indexed to make use of it.
/usr/local/lib/python3.6/site-packages/parsedmarc/elastic.py
class _AggregateReportDoc(Document):
class Index:
name = "dmarc_aggregate"
xml_schema = Text()
org_name = Text()
org_email = Text()
org_extra_contact_info = Text()
report_id = Text()
date_range = Date()
date_begin = Date() #add date begin
date_end = Date() #add date end
for record in aggregate_report["records"]:
agg_doc = _AggregateReportDoc(
xml_schema=aggregate_report["xml_schema"],
org_name=metadata["org_name"],
org_email=metadata["org_email"],
org_extra_contact_info=metadata["org_extra_contact_info"],
report_id=metadata["report_id"],
date_range=date_range,
date_begin=aggregate_report["begin_date"], #add date begin
date_end=aggregate_report["end_date"], #add date end
Using date_range as timestamp field, instead of separate begin_date & end_date fields, leads to elasticsearch to include each document twice in the buckets of date aggregation queries. That is, once in the bucket corresponding to the begin date, and the same for the end date.
This causes all date histogram visualizations to be wrong, as they say about twice as many messages as they actually are.
You can see this in the sample grafana screenshot provided by @bhozar (3557 emails sent on "Email Count" vs 1308+471+1 on "Message Volume by Header From"). https://grafana.com/api/dashboards/11227/images/7191/image
Keeping date_range if you want, but adding a begin_date filed, would solve the problem.