Default report timing should take into account knowledge horizon function

FlexMeasures / flexmeasures

The intelligent & developer-friendly EMS to support real-time energy flexibility apps, rapidly and scalable.

https://flexmeasures.io

Apache License 2.0

143 stars 35 forks source link

Default report timing should take into account knowledge horizon function #733

Open Flix6x opened 1 year ago

Flix6x commented 1 year ago

When calling a report via CLI (flexmeasures add report), the default end time is now. This implicitly assumes an ex-post knowledge horizon, which does not yield a sensible end time for e.g. day-ahead markets. The default end time should pass server_now() through the sensor's knowledge time function. Possibly the inverse function is needed, which may involve a loop (bounded by the knowledge horizon bounds) or an implementation of an analytical solution of the reverse function for each knowledge time function.

Flix6x commented 1 year ago

Specifically, I'm calling flexmeasures add report, without parameters that would set a time window, for computing a (day-ahead retail) dynamic tariff based on the day-ahead wholesale prices. The tariffs are then calculated from the last known tariff until now, instead of until the end of tomorrow.

I guess a workaround could be to set the -end-offset option to 2D,DB and making sure that the command is only called on D-1. Setting it to more than needed would currently not be a good workaround, because the command would save NaN values to the database, and that would trip up the next daily run. (Saving NaN values to the database is a separate problem, see https://github.com/FlexMeasures/flexmeasures/pull/735.)

victorgarcia98 commented 1 year ago

As an alternative, we could use the function get_timerange individually to each of the sensors involved to get end_date = min(end_date_sensor1, end_date_sensor_2, ...), or even just get the maximum (latest) applying get_timerange to the list of sensors.

This clearly will make computations/fetch time unnecessarily longer.

Flix6x commented 1 year ago

It's an alternative worthy of consideration. The data fetching would indeed involve a second query, but only in case no end (offset) was explicitly passed. It also means we'd default to computing a report on the latest available complete set of input data, as opposed to on the latest set of input data that could have been known at that time (but without actually checking whether the input data is complete).

Two considerations that come up (in any case):

Would the report still run succesfully if just one of the input sensors is missing data?
Are we warned in case that happens?

victorgarcia98 commented 1 year ago

I guess a workaround could be to set the -end-offset option to 2D,DB and making sure that the command is only called on D-1.

Actually, we would need to ensure that the command is run after 12pm to ensure that we have data.

Would the report still run succesfully if just one of the input sensors is missing data?

It might, a 'silly' case is when we list the sensor in the beliefs_search_configs but we don't use it.

Are we warned in case that happens?

Regarding the use of knowledge functions, I don't think I would require to implement both the direct and the inverse functions just to have a default.

In some cases,

Using the bounds is an option. For instance, in the x_days_ago_at_y_oclock there are the following bounds:

 timedelta(days=x, hours=-y - 2), timedelta(days=x + 1, hours=-y + 2)

In case of the function at_date, the bound is timedeta.max.

With that, an idea is to set the default end_date = server_now + min(bound[1], timdelta(days=2)), where we impose a minimum of 2 days of extra data, for example. This could be a FM parameter.

Flix6x commented 1 year ago

The workaround of --end-offset 2D,DB is also not perfect. For example, running it just now shows:

end: 2023-06-23 00:00:00+00:00

Which doesn't respect the timezone of the output sensor. Given the "Europe/Amsterdam" timezone, I was expecting:

end: 2023-06-23 00:00:00+02:00

Running this in a timezone with a positive offset leads to the time window of the report "overshooting", and then generating and saving some NaN values. This can be alleviated by also setting the start offset, to make sure the NaN values are "overwritten" with new values the next day the reporter runs. For example, by setting --start-offset -2D,DB. Actually, -1D,DB or even DB might also work, but it doesn't hurt to try and fill some older data gaps while we are at it.
Running this in a timezone with a negative offset would lead to the time window of the report "falling short", so the report would not be computed until local midnight. Workaround: accept some NaN values will be saved, and make sure the next day they will be overwritten, using: --end-offset 3D,DB --start-offset -2D,DB.

Flix6x commented 1 year ago

Oh, actually one could also just use the --timezone Europe/Amsterdam option to solve that problem. Nevertheless, extending the time window on both ends is probably useful, even after 0.14.1 lands (which prevents the reporter from saving NaN values), for filling gaps.

Flix6x commented 1 year ago

With that, an idea is to set the default end_date = server_now + min(bound[1], timdelta(days=2)), where we impose a minimum of 2 days of extra data, for example. This could be a FM parameter.

I agree, this is a good solution. Instead of a new setting, I wouldn't mind to just also use the FLEXMEASURES_MAX_PLANNING_HORIZON setting for this purpose, so that it is effectively used in a broader sense, to set a limit on how far FlexMeasures does calculations into the future.