It's been a bit annoying that --statistics=creation reports a total count that's less than the actual number of datasets. This is because when we report by creation date, we rely on a JOIN between Dataset and Metadata, where the creation date comes from the metadata.logpbench.date field, which is missing from some datasets (a bit over 4,000 on the production server).
This PR changes --statistics=creation to count all rows returned by the SQL query, but to separately report the number of empty rows where the metadata was missing: e.g.,
[pbench@n002 /]$ pbench-report-generator --statistics=creation
Dataset statistics by creation date:
154,737 from 2012-04-13 19:21 to 2024-06-13 11:26
(count includes 4,019 datasets without a date)
It's been a bit annoying that
--statistics=creation
reports a total count that's less than the actual number of datasets. This is because when we report by creation date, we rely on aJOIN
betweenDataset
andMetadata
, where the creation date comes from themetadata.log
pbench.date
field, which is missing from some datasets (a bit over 4,000 on the production server).This PR changes
--statistics=creation
to count all rows returned by the SQL query, but to separately report the number of empty rows where the metadata was missing: e.g.,