DOI-USGS / ds-pipelines-targets-example-wqp

An example targets pipeline for pulling data from the Water Quality Portal (WQP)
Other
10 stars 14 forks source link

Omit site/query attributes from the downloaded data? #95

Closed lekoenig closed 1 year ago

lekoenig commented 2 years ago

This issue follows up on one @padilla410 introduced in #87. Although it doesn't appear to have caused the behavior Julie documented originally, I have wondered whether the attributes are likely to trigger unnecessary or unexpected rebuilds, and if we should do anything with either the site or query (url, queryTime) attributes that get appended in the call to dataRetrieval.

Perhaps targets drops attributes when combining the mapped data frames, because I don't see any attributes on the full data frame.

> tar_load(p2_wqp_data_aoi)
> str(p2_wqp_data_aoi)
'data.frame':   20266 obs. of  65 variables:
 $ OrganizationIdentifier                           : chr  "21PA_WQX" "21PA_WQX" "21PA_WQX" "21PA_WQX" ...
 ..
 $ ActivityEndDateTime                              : chr  NA NA NA NA ...

However, site/query attributes are there when I load any given branch:

> tar_load(p2_wqp_data_aoi_6c22aef5)
> str(p2_wqp_data_aoi_6c22aef5)
'data.frame':   6995 obs. of  65 variables:
 $ OrganizationIdentifier                           : chr  "USGS-PA" "21PA_WQX" "21PA_WQX" "21PA_WQX" ...
...
 $ ActivityEndDateTime                              : chr  "2020-10-22 20:05:59" NA NA NA ...
 - attr(*, "siteInfo")='data.frame':    199 obs. of  43 variables:
  ..$ station_nm                                     : chr [1:199] "AT 13094 PINE CREEK PA" "AT 13093 UNNAMED TRIBUTARY TO PINE CREEK PA" "AT 13092 BEAR CREEK PA" "AT 11088.3 UNNAMED TRIBUTARY TO LIZARD CREEK PA" ...
  ..$ agency_cd                                      : chr [1:199] "USGS-NY" "USGS-NY" "USGS-NY" "USGS-NY" ...
  ..$ site_no                                        : chr [1:199] "USGS-403707075572101" "USGS-403818075573301" "USGS-403924075573601" "USGS-404144075504001" ...
  ..$ dec_lat_va                                     : chr [1:199] "40.61863056000000" "40.63841667000000" "40.65680556000000" "40.69556110000000" ...
  ..$ dec_lon_va                                     : chr [1:199] "-75.9558417000000" "-75.9593722000000" "-75.9602277800000" "-75.8445833000000" ...
  ..$ hucCd                                          : chr [1:199] "02040203" "02040203" "02040203" "02040106" ...
  ..$ OrganizationIdentifier                         : chr [1:199] "USGS-NY" "USGS-NY" "USGS-NY" "USGS-NY" ...
  ..$ OrganizationFormalName                         : chr [1:199] "USGS New York Water Science Center" "USGS New York Water Science Center" "USGS New York Water Science Center" "USGS New York Water Science Center" ...
  ..$ MonitoringLocationIdentifier                   : chr [1:199] "USGS-403707075572101" "USGS-403818075573301" "USGS-403924075573601" "USGS-404144075504001" ...
  ..$ MonitoringLocationName                         : chr [1:199] "AT 13094 PINE CREEK PA" "AT 13093 UNNAMED TRIBUTARY TO PINE CREEK PA" "AT 13092 BEAR CREEK PA" "AT 11088.3 UNNAMED TRIBUTARY TO LIZARD CREEK PA" ...
  ..$ MonitoringLocationTypeName                     : chr [1:199] "Stream" "Stream" "Stream" "Stream" ...
  ..$ MonitoringLocationDescriptionText              : chr [1:199] NA NA NA NA ...
  ..$ HUCEightDigitCode                              : chr [1:199] "02040203" "02040203" "02040203" "02040106" ...
  ..$ DrainageAreaMeasure.MeasureValue               : num [1:199] NA NA NA NA NA NA NA NA 5.35 46.5 ...
  ..$ DrainageAreaMeasure.MeasureUnitCode            : chr [1:199] NA NA NA NA ...
  ..$ ContributingDrainageAreaMeasure.MeasureValue   : num [1:199] NA NA NA NA NA NA NA NA NA NA ...
  ..$ ContributingDrainageAreaMeasure.MeasureUnitCode: chr [1:199] NA NA NA NA ...
  ..$ LatitudeMeasure                                : chr [1:199] "40.61863056000000" "40.63841667000000" "40.65680556000000" "40.69556110000000" ...
  ..$ LongitudeMeasure                               : chr [1:199] "-75.9558417000000" "-75.9593722000000" "-75.9602277800000" "-75.8445833000000" ...
  ..$ SourceMapScaleNumeric                          : chr [1:199] "24000" "24000" "24000" "24000" ...
  ..$ HorizontalAccuracyMeasure.MeasureValue         : num [1:199] 0.1 0.1 0.01 0.01 0.1 0.01 0.1 0.01 0.5 1 ...
  ..$ HorizontalAccuracyMeasure.MeasureUnitCode      : chr [1:199] "seconds" "seconds" "seconds" "seconds" ...
  ..$ HorizontalCollectionMethodName                 : chr [1:199] "Interpolated from Digital MAP." "Interpolated from Digital MAP." "Interpolated from Digital MAP." "Interpolated from Digital MAP." ...
  ..$ HorizontalCoordinateReferenceSystemDatumName   : chr [1:199] "NAD83" "NAD83" "NAD83" "NAD83" ...
  ..$ VerticalMeasure.MeasureValue                   : num [1:199] 673 560 1226 1317 699 ...
  ..$ VerticalMeasure.MeasureUnitCode                : chr [1:199] "feet" "feet" "feet" "feet" ...
  ..$ VerticalAccuracyMeasure.MeasureValue           : num [1:199] 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1.6 1 NA ...
  ..$ VerticalAccuracyMeasure.MeasureUnitCode        : chr [1:199] "feet" "feet" "feet" "feet" ...
  ..$ VerticalCollectionMethodName                   : chr [1:199] "Interpolated from Digital Elevation Model" "Interpolated from Digital Elevation Model" "Interpolated from Digital Elevation Model" "Interpolated from Digital Elevation Model" ...
  ..$ VerticalCoordinateReferenceSystemDatumName     : chr [1:199] "NAVD88" "NAVD88" "NAVD88" "NAVD88" ...
  ..$ CountryCode                                    : chr [1:199] "US" "US" "US" "US" ...
  ..$ StateCode                                      : chr [1:199] "42" "42" "42" "42" ...
  ..$ CountyCode                                     : num [1:199] 11 11 107 107 107 107 25 89 25 25 ...
  ..$ AquiferName                                    : chr [1:199] NA NA NA NA ...
  ..$ LocalAqfrName                                  : chr [1:199] NA NA NA NA ...
  ..$ FormationTypeText                              : chr [1:199] NA NA NA NA ...
  ..$ AquiferTypeName                                : chr [1:199] NA NA NA NA ...
  ..$ ConstructionDateText                           : chr [1:199] NA NA NA NA ...
  ..$ WellDepthMeasure.MeasureValue                  : num [1:199] NA NA NA NA NA NA NA NA NA NA ...
  ..$ WellDepthMeasure.MeasureUnitCode               : chr [1:199] NA NA NA NA ...
  ..$ WellHoleDepthMeasure.MeasureValue              : num [1:199] NA NA NA NA NA NA NA NA NA NA ...
  ..$ WellHoleDepthMeasure.MeasureUnitCode           : chr [1:199] NA NA NA NA ...
  ..$ ProviderName                                   : chr [1:199] "NWIS" "NWIS" "NWIS" "NWIS" ...
 - attr(*, "variableInfo")='data.frame':    6995 obs. of  3 variables:
  ..$ characteristicName: chr [1:6995] "Specific conductance" "Specific conductance" "Specific conductance" "Specific conductance" ...
  ..$ param_units       : chr [1:6995] "uS/cm @25C" "umho/cm" "umho/cm" "umho/cm" ...
  ..$ valueType         : chr [1:6995] "Total" NA NA NA ...
 - attr(*, "queryTime")= POSIXct[1:1], format: "2022-09-21 16:56:43"
 - attr(*, "url")= chr "https://www.waterqualitydata.us/data/Result/search?sampleMedia=Water%3Bwater&siteType=Stream&minresults=1&start"| __truncated__
>

We could omit the site attributes by setting ignore_attributes to TRUE in dataRetrieval::readWQPdata(). I'm guessing some folks will want the site information, but perhaps we could add a target that retrieves that info on its own (so users could join the site info with the data if desired).

lekoenig commented 1 year ago

I'm guessing some folks will want the site information, but perhaps we could add a target that retrieves that info on its own (so users could join the site info with the data if desired).

I've been asked about this, so I'm going to go ahead and pull out the siteInfo attribute from each of the individual branches in p2_wqp_data_aoi and bind the site metadata into a single table that will be represented by a new target, p2_wqp_site_info.