NREL / rdtools

PV Analysis Tools in Python
https://rdtools.readthedocs.io/
MIT License
158 stars 67 forks source link

Miss match between input and output data size of the normalize_with_expected_power() #341

Open Matammanjunath opened 2 years ago

Matammanjunath commented 2 years ago

Describe the bug A clear and concise description of what the bug is. I am performing simple degradation analysis using rdtools. I got a data size error while processing normalize_with_expected_power().

Full error message and traceback Please copy/paste the entire error traceback, if applicable.

normalized, insolation = rdtools.normalize_with_expected_power(df[pwr_col],
                                                                modeled_power,
                                                                df[poa_col])
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

[<ipython-input-67-e6496a97d32b>](https://localhost:8080/#) in <module>()
----> 1 df['normalized'] = normalized.values
      2 df['insolation'] = insolation.values

3 frames

[/usr/local/lib/python3.7/dist-packages/pandas/core/common.py](https://localhost:8080/#) in require_length_match(data, index)
    530     if len(data) != len(index):
    531         raise ValueError(
--> 532             "Length of values "
    533             f"({len(data)}) "
    534             "does not match length of index "

ValueError: Length of values (1578239) does not match length of index (1578240)

To Reproduce Steps to reproduce the behavior. In this case I have considered NIST dataset of a PV plant. Below, I am reproducing modified function for debugging. This basically to check how the data size is changing at various steps.


  def normalize_with_expected_power(pv, power_expected, poa_global,
                                    pv_input='power'):
      '''
      Normalize PV power or energy based on expected PV power.

      Parameters
      ----------
      pv : pandas.Series
          Right-labeled time series PV energy or power. If energy, should *not*
          be cumulative, but only for preceding time step. Type (energy or power)
          must be specified in the ``pv_input`` parameter.
      power_expected : pandas.Series
          Right-labeled time series of expected PV power. (Note: Expected energy
          is not supported.)
      poa_global : pandas.Series
          Right-labeled time series of plane-of-array irradiance associated with
          ``expected_power``
      pv_input : str, {'power' or 'energy'}
          Specifies the type of input used for ``pv`` parameter. Default: 'power'

      Returns
      -------
      energy_normalized : pandas.Series
          Energy normalized based on ``power_expected``
      insolation : pandas.Series
          Insolation associated with each normalized point

      '''
      print("input pv shape is %s"%(pv.shape))
      print("input power_expected shape is %s"%(power_expected.shape))
      print("input POA shape is %s"%(poa_global.shape))
      freq = _check_series_frequency(pv, 'pv')
      print(pv.shape)
      print(power_expected.shape)
      if pv_input == 'power':
          energy = energy_from_power(pv, freq, power_type='right_labeled')
          print("Energy shape is %s"%(energy.shape))
      elif pv_input == 'energy':
          energy = pv.copy()
          energy.name = 'energy_Wh'
      else:
          raise ValueError("Unexpected value for pv_input. pv_input should be 'power' or 'energy'.")

      model_tds, mean_model_td = _delta_index(power_expected)
      print("Model TDS shape is %s"%(model_tds.shape))
      measure_tds, mean_measure_td = _delta_index(energy)
      print("Measure TDS shape is %s"%(measure_tds.shape))

      # Case in which the model less frequent than the measurements
      if mean_model_td > mean_measure_td:
          power_expected = interpolate(power_expected, pv.index)
          print("Power expected shape is %s"%(power_expected.shape))
          poa_global = interpolate(poa_global, pv.index)
          print("POA shape is %s"%(poa_global.shape))

      energy_expected = energy_from_power(power_expected, freq, power_type='right_labeled')
      print("Energy expected shape is %s"%(energy_expected.shape))
      insolation = energy_from_power(poa_global, freq, power_type='right_labeled')
      print("Insolation shape is %s"%(insolation.shape))

      energy_normalized = energy / energy_expected
      print("Energy normalized shape is %s"%(energy_normalized.shape))

      index_union = energy_normalized.index.union(insolation.index)
      print("index_union shape is %s"%(index_union.shape))
      energy_normalized = energy_normalized.reindex(index_union)
      print("energy_normalized shape is %s"%(energy_normalized.shape))
      insolation = insolation.reindex(index_union)
      print("insolation shape is %s"%(insolation.shape))

      return energy_normalized, insolation

Reran the above function with required data

normalized, insolation = normalize_with_expected_power(df[pwr_col],modeled_power,df[poa_col]) Size of the input and output data of the above function are:

**input data size 1578240**
df.index 
DatetimeIndex(['2015-01-01 00:00:00-05:00', '2015-01-01 00:01:00-05:00',
               ....
               '2017-12-31 23:58:00-05:00', '2017-12-31 23:59:00-05:00'],
              dtype='datetime64[ns, pytz.FixedOffset(-300)]', length=**1578240**, freq='T')
modeled_power.index
DatetimeIndex(['2015-01-01 00:00:00-05:00', '2015-01-01 00:01:00-05:00',
               ....
               '2017-12-31 23:58:00-05:00', '2017-12-31 23:59:00-05:00'],
              dtype='datetime64[ns, pytz.FixedOffset(-300)]', length=**1578240**, freq='T')

**output data size 1578239**
normalized.index
DatetimeIndex(['2015-01-01 00:01:00-05:00', '2015-01-01 00:02:00-05:00',
               ....
               '2017-12-31 23:58:00-05:00', '2017-12-31 23:59:00-05:00'],
              dtype='datetime64[ns, pytz.FixedOffset(-300)]', length=**1578239**, freq='T')

Change of data size inside the normalize_with_expected_power() function. As we, see, the culprit here is energy shape becoming one less than input data size; this comes out from the energy_from_power() function and I dug further inside this function, I found the root culprit is the _aggregate() function.

input pv shape is 1578240
input power_expected shape is 1578240
input POA shape is 1578240
(1578240,)
(1578240,)
Energy shape is 1578239
Model TDS shape is 1578240
Measure TDS shape is 1578239
Energy expected shape is 1578239
Insolation shape is 1578239
Energy normalized shape is 1578239
index_union shape is 1578239
energy_normalized shape is 1578239
insolation shape is 1578239

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.