NREL / rdtools

PV Analysis Tools in Python
https://rdtools.readthedocs.io/
MIT License
156 stars 67 forks source link

`energy_from_power` returns incorrect index for shifted hourly data #370

Open kandersolar opened 1 year ago

kandersolar commented 1 year ago

TrendAnalysis does not like hourly data where the timestamps are not at the top of the hour. Here is a toy example where it errors when given an input time series timestamped at the half-hour mark instead of the usual 00:

import pandas as pd
import numpy as np
import rdtools

# toy hourly dataset, note that minute=30
times = pd.date_range('2019-01-01 00:30:00', periods=8760*3, freq='H')
df = pd.DataFrame({'pv': 1 - 0.005*np.arange(len(times))/8760,
                   'poa_global': 1000,
                   'temperature_cell': 25},
                  index=times)

ta = rdtools.TrendAnalysis(**df)
ta.filter_params.pop('clip_filter')  # clipping filter doesn't like this toy dataset
ta.sensor_analysis()  # ValueError: Less than two years of data left after filtering

Inspection of the object's attributes reveals that pv_energy's index is misaligned with the original data's index (the minutes have been truncated to 00):

In [181]: ta.pv_energy[:3]
Out[181]: 
2019-01-01 01:00:00    0.500000
2019-01-01 02:00:00    0.999999
2019-01-01 03:00:00    0.999999
Freq: H, Name: energy_Wh, dtype: float64

Index misalignment between energy and the irradiance data messes up the filtering, resulting in the "less than two years" error. Of course, the index difference traces back to energy_from_power:

In [189]: rdtools.normalization.energy_from_power(df['pv'])[:3]
Out[189]: 
2019-01-01 01:00:00    0.500000
2019-01-01 02:00:00    0.999999
2019-01-01 03:00:00    0.999999
Freq: H, Name: energy_Wh, dtype: float64

And from there, I think, to the call to _aggregate and a resample it performs:

https://github.com/NREL/rdtools/blob/d19fa83dd2b17a32129383976eb47d94be5c9ae8/rdtools/normalization.py#L615-L618

martin-springer commented 1 week ago

addressed with PR #437