distributed-system-analysis / pbench

A benchmarking and performance analysis framework
http://distributed-system-analysis.github.io/pbench/
GNU General Public License v3.0
188 stars 108 forks source link

Account for "dateless" datasets #3623

Closed dbutenhof closed 5 months ago

dbutenhof commented 5 months ago

It's been a bit annoying that --statistics=creation reports a total count that's less than the actual number of datasets. This is because when we report by creation date, we rely on a JOIN between Dataset and Metadata, where the creation date comes from the metadata.log pbench.date field, which is missing from some datasets (a bit over 4,000 on the production server).

This PR changes --statistics=creation to count all rows returned by the SQL query, but to separately report the number of empty rows where the metadata was missing: e.g.,

[pbench@n002 /]$ pbench-report-generator --statistics=creation
Dataset statistics by creation date:
 154,737 from 2012-04-13 19:21 to 2024-06-13 11:26
  (count includes 4,019 datasets without a date)