USGS-R / protoloads

Prototyping and exploring options for broad-scale load forecasting
0 stars 4 forks source link

Pulls from subsetted NWM nc files produce unexpected hydrographs #42

Closed jzwart closed 6 years ago

jzwart commented 6 years ago

Using the new nc files created with PR #33 , the retro file has hydrographs that are unexpected given the sites (e.g. Mississippi R. is 0 across the time series). There might be a lookup table issue - I checked the coordinates in the lookup table and those align with the location on google maps, but I haven't checked out the comids or site_no.

image

code used to generate fig from aggregate_inputs.R:


  comids_lookup = readr::read_delim(sc_retrieve(comids_file, remake_file = remake_file), delim='\t')

  input_raw = nc_open(sc_retrieve(raw_ind_file, remake_file = remake_file))

  site_inds <- match(comids_lookup$COMID, input_raw$dim$feature_id$vals) # indices into original nc file

  new_feature_id <- input_raw$dim$feature_id$vals[site_inds] # comid list

  dimids <- input_raw$var$streamflow$dimids

  if(length(dimids) == 2) {
    streamflow <- matrix(nrow=input_raw$dim$time$len, ncol=length(site_inds))
  } else if (length(dimids) == 3) {
    if(!all(input_raw$var$streamflow$dimids == c(0,2,1))) stop("dimids now as expected")
    streamflow <- array(dim = c(length(site_inds), input_raw$dim$time$len, input_raw$dim$reference_time$len))
  }

  time <- convert_time_nc2posix(input_raw$var$streamflow$dim[[2]]) # time value converted to posix

  for(s in 1:length(site_inds)) {
    if(length(dimids) == 2) {
      streamflow[,s] <- ncvar_get(input_raw, input_raw$var$streamflow,
                start = c(site_inds[s], 1),
                count = c(1, -1),
                raw_datavals = TRUE)

    } else if(length(dimids) == 3) {
      for(r in 1:input_raw$dim$reference_time$len) {
        streamflow[s,,r] <- ncvar_get(input_raw, input_raw$var$streamflow,
                                      start = c(site_inds[s], 1, r),
                                      count = c(1, -1, 1))
      }
    }
  }
  # stream flow is [site, valid time, reference time]

  #need streamflow, site_no, dateTime, for retro
  # need streamflow, site_no, refTime, validTime, for forecast
  info = read.table('1_data/out/site_info.tsv', sep='\t',header=T)

  windows()
  par(mfrow = c(2,2))
  for(s in 1:length(site_inds)){
    plot(streamflow[,s], main = info$station_nm[match(input_raw$dim$feature_id$vals[site_inds[s]], info$COMID)],
         type='l')
  }
dblodgett-usgs commented 6 years ago

I can verify that the NHDPlus Catchment is on the Mississippi main stem and theoretically should have flow from the NWM. So, the match of NWIS site to NHDPlus appears to be good.

I'll have to look into the possibility that things got out of order in the subsetting or something. I used "%in%" in the original subset and match() was used in subset_nwm()... Not sure how that would have gone wrong, but I have a suspicion that there are duplicate COMIDs in the NWM files that could be causing a problem...

dblodgett-usgs commented 6 years ago

Yeah... that's the issue. I'm going to redo the subset to include the duplicates. The Mississippi flow will be included, but a munge(?) step will need to pick which one to actually use. I'll get the issue fixed at the source later, but this will give you data for now.

dblodgett-usgs commented 6 years ago

I guess it'll need to get handled in: aggregate_nwm() -- I'll try and get to that, but may not have time right now.

jzwart commented 6 years ago

Ahh ok, thanks for checking and running this @dblodgett-usgs. I'll take look at how to deal with them in aggregate_nwm

jzwart commented 6 years ago

Somewhat fixed in PR #46, but is only dealt with through visual inspection right now. Not flexible to adding more sites, so leaving this issue still open

aappling-usgs commented 6 years ago

I believe this is fully fixed now