SantanderMetGroup / downscaleR

An R package for climate data bias correction and downscaling (part of the climate4R bundle)
https://github.com/SantanderMetGroup/climate4R
GNU General Public License v3.0
104 stars 59 forks source link

Data has wrong time dimension when using delta biasCorrection #31

Closed matteodefelice closed 6 years ago

matteodefelice commented 8 years ago

Here, I have an observational dataset (obs) and a forecast (fcst).

> str(obs)
List of 4
 $ Variable:List of 2
  ..$ varName: chr "var167"
  ..$ level  : NULL
  ..- attr(*, "is_standard")= logi FALSE
  ..- attr(*, "units")= chr "undefined"
  ..- attr(*, "longname")= chr "undefined"
  ..- attr(*, "daily_agg_cellfun")= chr "none"
  ..- attr(*, "monthly_agg_cellfun")= chr "none"
  ..- attr(*, "verification_time")= chr "none"
 $ Data    : num [1:72, 1:35, 1:50] 292 294 295 294 295 ...
  ..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
 $ xyCoords:List of 2
  ..$ x: num [1:50] -12 -11.25 -10.5 -9.75 -9 ...
  ..$ y: num [1:35] 32.2 33 33.7 34.5 35.2 ...
  ..- attr(*, "projection")= chr "LatLonProjection"
  ..- attr(*, "resX")= num 0.75
  ..- attr(*, "resY")= num 0.75
  ..- attr(*, "interpolation")= chr "nearest"
 $ Dates   :List of 2
  ..$ start: chr [1:72] "1984-06-30 18:00:00 GMT" "1984-07-31 18:00:00 GMT" "1984-08-31 18:00:00 GMT" "1985-06-30 18:00:00 GMT" ...
  ..$ end  : chr [1:72] "1984-06-30 18:00:00 GMT" "1984-07-31 18:00:00 GMT" "1984-08-31 18:00:00 GMT" "1985-06-30 18:00:00 GMT" ...
 - attr(*, "dataset")= chr "/opt/data/ERAIN-T2M/ERAIN-t2m-1983-2012.mon.EUROPE.nc"
> str(fcst)
List of 6
 $ Variable           :List of 2
  ..$ varName: chr "tas"
  ..$ level  : NULL
  ..- attr(*, "use_dictionary")= logi TRUE
  ..- attr(*, "description")= chr "2 metre temperature @ Ground or water surface"
  ..- attr(*, "units")= chr "degrees Celsius"
  ..- attr(*, "longname")= chr "2-meter air temperature"
  ..- attr(*, "daily_agg_cellfun")= chr "none"
  ..- attr(*, "monthly_agg_cellfun")=function (x, ...)  
  ..- attr(*, "verification_time")= chr "none"
 $ Data               : num [1:72, 1:35, 1:50] 21.8 23.3 23.2 19.1 21.1 ...
  ..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
 $ xyCoords           :List of 2
  ..$ x: num [1:50] -12 -11.25 -10.5 -9.75 -9 ...
  ..$ y: num [1:35] 32.2 33 33.7 34.5 35.2 ...
  ..- attr(*, "projection")= chr "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
 $ Dates              :List of 2
  ..$ start: chr [1:72(1d)] "1984-06-01 00:00:00 GMT" "1984-07-01 00:00:00 GMT" "1984-08-01 00:00:00 GMT" "1985-06-01 00:00:00 GMT" ...
  ..$ end  : chr [1:72(1d)] "1984-06-30 18:00:00 GMT" "1984-07-31 18:00:00 GMT" "1984-08-31 18:00:00 GMT" "1985-06-30 18:00:00 GMT" ...
 $ InitializationDates: chr [1:26] "1984-05-01 00:00:00 GMT" "1985-05-01 00:00:00 GMT" "1986-05-01 00:00:00 GMT" "1987-05-01 00:00:00 GMT" ...
 $ Members            : chr "Member_1"
 - attr(*, "dataset")= chr "System4_seasonal_15"
 - attr(*, "source")= chr "ECOMS User Data Gateway"
 - attr(*, "URL")= chr "<http://meteo.unican.es/trac/wiki/udg/ecoms>"

When I apply a delta bias correction with the command:

cal <- biasCorrection(y = obs,
                      x = fcst,
                      newdata = fcst,
                      cross.val = 'loocv',
                      method = "delta")

I got a grid with a wrong time dimension:

> str(cal)
List of 6
 $ Variable           :List of 2
  ..$ varName: chr "var167"
  ..$ level  : NULL
  ..- attr(*, "is_standard")= logi FALSE
  ..- attr(*, "units")= chr "undefined"
  ..- attr(*, "longname")= chr "undefined"
  ..- attr(*, "daily_agg_cellfun")= chr "none"
  ..- attr(*, "monthly_agg_cellfun")= chr "none"
  ..- attr(*, "verification_time")= chr "none"
 $ Data               : num [1:1656, 1:35, 1:50] 295 296 297 294 296 ...
 $ xyCoords           :List of 2
  ..$ x: num [1:50] -12 -11.25 -10.5 -9.75 -9 ...
  ..$ y: num [1:35] 32.2 33 33.7 34.5 35.2 ...
  ..- attr(*, "projection")= chr "LatLonProjection"
  ..- attr(*, "resX")= num 0.75
  ..- attr(*, "resY")= num 0.75
  ..- attr(*, "interpolation")= chr "nearest"
 $ Dates              :List of 2
  ..$ start: chr [1:72(1d)] "1984-06-01 00:00:00 GMT" "1984-07-01 00:00:00 GMT" "1984-08-01 00:00:00 GMT" "1985-06-01 00:00:00 GMT" ...
  ..$ end  : chr [1:72(1d)] "1984-06-30 18:00:00 GMT" "1984-07-31 18:00:00 GMT" "1984-08-31 18:00:00 GMT" "1985-06-30 18:00:00 GMT" ...
 $ InitializationDates: chr [1:26] "1984-05-01 00:00:00 GMT" "1985-05-01 00:00:00 GMT" "1986-05-01 00:00:00 GMT" "1987-05-01 00:00:00 GMT" ...
 $ Members            : chr "Member_1"
 - attr(*, "dataset")= chr "/opt/data/ERAIN-T2M/ERAIN-t2m-1983-2012.mon.EUROPE.nc"

This is not happening with other bias correction methods! It seems a bug due to CV, in fact without it, it works fine.

miturbide commented 8 years ago

Thanks for reporting! As you said, there was a bug when applying the "delta" method and CV. The error was due to the particularity of the "delta" correction, that is applied in the following general manner (when cross.val = "none"): y + (mean(newdata) - mean(x))

thus the resulting object has the same time dimension as the observation data (y).

When cross.val = "loocv"/"kfold", this formula is applied as many times as number of data partitions are defined for test (number of years/folds): y[train years/folds] + (mean(x[test year/fold]) - mean(x[train years/folds])),

thus, the time dimension = numberOf_years/folds * daysInEach_train_year/fold).

We have made some changes at this respect. Now, when applying the "delta" method in CV, a subset of "y" is done to get the same time series of the test data (daily series of 1 year for "loocv" or daily series of 1 fold for "kfold"). The "delta" method is now applied in this particular manner when cross.val != "none":

y[test year/fold] + (mean(x[test year/fold]) - mean(x[train years/folds])),

Thus, the binding of the outputs corresponding to all years/folds gives the correct time dimension.

matteodefelice commented 8 years ago

When are you going to release a new master version? Or can I switch to the devel branch?

jbedia commented 8 years ago

Hi Matteo, we are about to move to downscaleR 2.0-0. The data transformation/manipulation tools have been moved to transformeR, which becomes now a dependency for downscaleR. We have still to update all the documentation, but this is ready for testing, so you can switch to the devel version now. Your feedback will be welcome!.

devtools::install_github(c("SantanderMetGroup/transformeR",
                           "SantanderMetGroup/downscaleR@devel"))