exclude report outage periods

lboeman commented 2 years ago

[x] closes #730
- [x] I am familiar with the contributing guidelines.
- [x] Tests added.
- [x] Updates entries to docs/source/api.rst for API changes.
- [x] Adds descriptions to appropriate "what's new" file in docs/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
- [x] New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
- [x] Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

Adds handling for excluding outage periods from a report. Counterpart to API PR https://github.com/SolarArbiter/solarforecastarbiter-api/pull/325.

Converts a collection of (start, end) values representing system outages into a collection of (start, end) values that represents the forecast values associated with any forecast submissions that fall within an outage. Then we use the start, end values to mask out forecast and observation values from the report before applying quality flags or other validation. This is so that we don't mistakenly report values that were dropped due to an outage as missing.

lboeman commented 2 years ago

Working on using the existing utility functions here I encountered a few issues. The reference_forecasts.get_issue_times is always starts with midnight, then the issue time, and then timestamps spaced run_length apart until midnight the next day(inclusively). This results in losing any issue times before the first issue time of the day while getting duplicate midnight values.

I don't think that issue will be too hard to fix for run lengths that equally divide 24 hours. But for odd-interval forecasts, I don't think we have a defined behavior for the spill-over/overlap. For instance, a 5 hour run length forecast will always overlap it's own first issue time @wholmgren do you think that times that fall before issue_time_of_day should be a continuation of the previous days forecasts?

wholmgren commented 2 years ago

Oof. I don't recall discussing this ~3 years ago when formulating the datamodel, but I'm now thinking that we should enforce forecasts are issued at the same times every day. Our current reference forecasts would fail to do anything sensible with e.g. 5 hour run lengths, right?

lboeman commented 2 years ago

Test failures remaining are due to the dev api tests and data mismatch, everything else is passing and ready for review.

SolarArbiter / solarforecastarbiter-core

exclude report outage periods #752