IHCantabria / NEOPRENE

Neyman-Scott Process Rainfall Emulator library
GNU General Public License v3.0
13 stars 6 forks source link

Bug in Example 2. Disaggregation of daily data to hourly data #22

Closed monicasantamaria closed 4 months ago

monicasantamaria commented 1 year ago

Hello!

I wanted to use NEOPRENE to disaggregate daily to hourly rainfall. I'm running the NSRP_test notebook with your sample data, using Version 1.0.1. But in Example 2, when calling the disaggregate_rainfall function, I get the following error:

Cell In[16], line 2
      1 # Daily-to-hourly disaggregation
----> 2 Analysis_results.disaggregate_rainfall(x_series, y_series)
      3 hourly_disaggregation = Analysis_results.hourly_disaggregation

File [~/NEOPRENE-main/notebooks/~/AppData/Roaming/Python/Python39/site-packages/NEOPRENE/NSRP/Analysis.py:244), in Analysis.disaggregate_rainfall(self, x_series, y_series)
    243 def disaggregate_rainfall(self,x_series, y_series):
--> 244     self.hourly_disaggregation = disaggregation_rainfall(x_series,y_series)

File [~/NEOPRENE-main/notebooks/~/AppData/Roaming/Python/Python39/site-packages/NEOPRENE/NSRP/Analysis.py:137), in disaggregation_rainfall(x_series, y_series)
    122 """
    123 Dissagregation function from: A spatial–temporal point process model of rainfall
    124 for the Thames catchment, UK (Cowpertwait 2005). Eq:15
   (...)
    131 results: x_series disaggregated from daily-to-hourly
    132 """
    134 #y_series_daily = y_series.resample('D').agg(pd.Series.sum, min_count=1)
    135 #results=x_series.resample('h').agg(pd.Series.sum, min_count=1)*np.nan
--> 137 y_series_daily = pd.DataFrame(y_series.groupby([(y_series.index.year),(y_series.index.month),(y_series.index.day)]).sum().values,index=pd.period_range(start=y_series.index[0],end=y_series.index[-1],freq='D'),columns=['Rain'])
    139 #y_series_daily = y_series.resample('D').agg(pd.Series.sum, min_count=1)
    140 dti = pd.date_range(start=x_series.index[0], end=x_series.index[-1] + timedelta(hours=23), freq="H")

File [c:\Program](file:///C:/Program) Files\Python39\lib\site-packages\pandas\core\frame.py:672, in DataFrame.__init__(self, data, index, columns, dtype, copy)
...
    389 passed = values.shape
    390 implied = (len(index), len(columns))
--> 391 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (29221, 0), indices imply (29221, 1)

It looks like the sum() method, after the grouping, is producing a Series without any columns, which causes the mismatch when trying to create the DataFrame with a single column 'Rain' ?

monicasantamaria commented 1 year ago

After reviewing the disaggregation_rainfall function, I noticed that this was only a resampling from hourly to daily of the synthetic series, which could be done more directly by: y_series_daily = y_series.resample('D').agg(pd.Series.sum, min_count=1) This avoids the error of incongruence in the dataframe shapes.

JavierDiezSierra commented 1 year ago

@monicasantamaria many thanks for reporting the bug. I have just reopened the issue to check it in the source code.

manueldeljesus commented 10 months ago

Thank you @monicasantamaria for the report, and sorry for the delay in taking care of the issue.

I have pushed a new commit 94670a2 into branch issue_22 that makes the Jupyter notebook work again. It would be great if you could install this branch and check if it also solves the issue for you.

manueldeljesus commented 4 months ago

We have pushed a new version of the library where everything should be working back again. I will close this issue, but please, do not hesitate to open it again in case something does not work.