USEPA / EPATADA

This R package can be used to compile and evaluate Water Quality Portal (WQP) data for samples collected from surface water monitoring sites on streams and lakes. It can be used to create applications that support water quality programs and help states, tribes, and other stakeholders efficiently analyze the data.
https://usepa.github.io/EPATADA/
Creative Commons Zero v1.0 Universal
40 stars 18 forks source link

Retired units handling in TADA #229

Closed ehinman closed 1 year ago

ehinman commented 1 year ago

Is your feature request related to a problem? Please describe. Retired units in WQP data pulls do not get caught by the WQXCharValRef table, which is generated by pulling the QAQC Characteristic Validation domain table. Because of this, retired units, such as mg/L as N, are not flagged as valid/invalid and often cannot be assessed against national thresholds. At some point, their units need to be converted to current target units, but that spot has not been widely discussed/established.

Describe the solution you'd like This step could most easily be achieved in the data harmonization table, which will become more robust with further development on the WQX domains side. Characteristics with retired units aren't necessarily a QC problem (i.e. retired units aren't grounds for removal) but are a translation/harmonization problem.

Describe alternatives you've considered Adding retired units to the QAQC validation table with the correct target unit populated is another option: but how to ensure the "as N" portion is retained in the correct column (speciation).

Additional context today = Sys.Date() twoago = as.character(today-2*365) testdat = TADAdataRetrieval(statecode = "UT", startDate = twoago, characteristicName = c("Nitrate"), sampleMedia = "Water") testdat = ConvertResultUnits(testdat, transform = TRUE) testdat = AboveNationalWQXUpperThreshold(testdat, clean = FALSE, errorsonly=FALSE) testcase = subset(testdat, is.na(testdat$TADA.ResultValueAboveUpperThreshold.Flag))

cristinamullin commented 1 year ago

@ehinman Another option I just remembered that we've considered previously, is to address this within the special characters function (or the unit conversions function) for these very common issues where the metadata is in the unit. Like we do with specific result value issues (>, <. etc.). This one, "mg/L as N" is very common, and "mg N/L" may be common as well. This might be an easier solution. Thoughts?

ehinman commented 1 year ago

This seems a solid idea if users do not wish to be involved in how retired units are handled--how extensive do you think this translation needs to be (e.g. should we make a translation table for all of them or just the most common ones)? Do we need to flag this conversion somehow? Will they all be 1:1?

jbousquin commented 1 year ago

In harmonize-wq there is a unit_basis_dict lookup for these to update (append to) the basis column from the units string. Here are the ones we saw in Gulf of Mexico Estuaries:

{'Phosphorus': {'as P': {'mg/l': ['mg/l as P', 'mg/l P'],
                         'mg/kg': ['mg/kg as P', 'mg/kg P']},
                'as PO4': {'mg/l': ['mg/l as PO4', 'mg/l PO4'],
                           'mg/kg': ['mg/kg as PO4', 'mg/kg PO4']}},
 'Nitrogen': {'as N': {'mg/l': ['mg/l as N', 'mg/l N']}},
 'Carbon': {},
}

Right now it doesn't flag it, my only concern would be if it were in conflict with the previous speciation.

ehinman commented 1 year ago

After a discussion with @cristinamullin yesterday, I believe these issues are isolated to USGS data (not retired units), where the units mg/l as P, mg/l asNO3, etc. are accepted and pushed into the WQP. I think the best way to proceed right now is to create a USGS unit-specific table that gets tacked onto the MeasureUnit reference table (within TADA) used to convert to the target unit. Target units for USGS data will be unified with WQX target units, and any speciation data will be added to the MethodSpeciation column in the TADA dataset. Agree @jbousquin, we need to account for situations where the new speciation may "overwrite" (in a new column) the original--I am unsure how prevalent this situation will be. Hopefully this will be resolved as USGS migrates toward the WQX framework.

ehinman commented 1 year ago

This issue was (hopefully) resolved in this PR: https://github.com/USEPA/TADA/pull/308/files