USGS-R / protoloads

Prototyping and exploring options for broad-scale load forecasting
0 stars 4 forks source link

check units in new NWM data #68

Closed aappling-usgs closed 6 years ago

aappling-usgs commented 6 years ago

I just finished re-running all the models (about 30 hours); see #67 .

Now I'm rebuilding the old figures. I think there has been a 100x increase in the NWM forecast magnitudes from the last round of model outputs to this one. It could be happening anywhere in the pipeline, from NWM pull to aggregation to modeling, because we had to modify each of those steps a bit this week.

Forecasts plot from today (for WRTDS): fig_preds_v_time_wrtds

Forecasts plot from last time (also for WRTDS): fig_preds_v_time 1

Inputs plot from today: fig_input_data

Inputs plot from last time: fig_input_data 2

Errors plot from today: fig_error_v_leadtime

Errors plot from last time: fig_error_v_leadtime

dblodgett-usgs commented 6 years ago

Noooo!!! Scale Factor. Looking into it.

aappling-usgs commented 6 years ago

Agreed, I think scale factor is a good candidate for explaining this. Thanks for looking into it, @dblodgett-usgs .

aappling-usgs commented 6 years ago

Dave found it :+1:

https://github.com/USGS-R/protoloads/blob/master/2_munge/src/aggregate_inputs.R#L77 That causes the data to come in without the scale factor applied! Which corrected the old data but breaks the new data.

aappling-usgs commented 6 years ago

The "solid hack" for this week: 1) confirm that the scale factor in the .nc files is always 100 2) change aggregate_inputs code and the aggregated NWM output, BUT 3) not update models; instead apply a temporary divide-by-100 in gather_forecasts so that the outputs collected in 3_forecast/out/preds_loadest.rds and 3_forecast/out/preds_wrtds.rds appear to be correct

aappling-usgs commented 6 years ago
  1. confirm that the scale factor in the .nc files is always 100 (got the division direction wrong, but yes, it's 0.01):
raw_nwm_files <- c(
  '1_data/out/nwm_ana.nc',
  '1_data/out/nwm_retro.nc',
  '1_data/out/nwm_med.nc',
  '1_data/out/nwm_long1.nc',
  '1_data/out/nwm_long2.nc',
  '1_data/out/nwm_long3.nc',
  '1_data/out/nwm_long4.nc')

vapply(setNames(nm=raw_nwm_files), function(raw_nwm_file) {
  input_raw <- nc_open(raw_nwm_file)
  scaleFact <- input_raw$var$streamflow$scaleFact
  nc_close(input_raw)
  return(scaleFact)
}, FUN.VALUE=numeric(1)) %>% {data_frame(file=names(.), scaleFact=.)}

# A tibble: 7 x 2
#   file                    scaleFact
#   <chr>                       <dbl>
# 1 1_data/out/nwm_ana.nc     0.01000
# 2 1_data/out/nwm_retro.nc   0.01000
# 3 1_data/out/nwm_med.nc     0.01000
# 4 1_data/out/nwm_long1.nc   0.01000
# 5 1_data/out/nwm_long2.nc   0.01000
# 6 1_data/out/nwm_long3.nc   0.01000
# 7 1_data/out/nwm_long4.nc   0.01000
aappling-usgs commented 6 years ago

OK, I think all we need to do from here (after my upcoming PR) is avoid running scmake('3_forecast','3_forecast.yml') until we're ready to wait 30 hours.

Aggregated inputs are now correct and correctly derived from the nc files (using scaleFact as intended). Aggregated model outputs are correct but incorrectly derived (using the hack of dividing by 100 in gather_forecasts()).