USEPA / EPATADA

This R package can be used to compile and evaluate Water Quality Portal (WQP) data for samples collected from surface water monitoring sites on streams and lakes. It can be used to create applications that support water quality programs and help states, tribes, and other stakeholders efficiently analyze the data.
https://usepa.github.io/EPATADA/
Creative Commons Zero v1.0 Universal
39 stars 18 forks source link

Create function to find paired data (T, pH, hardness dependent criteria) #392

Closed cristinamullin closed 1 week ago

cristinamullin commented 7 months ago

Is your feature request related to a problem? Please describe.

T, pH, hardness, etc. dependent water quality assessment criteria (e.g. metals) require a default value that is site specific OR preferably a paired T, pH, hardness, etc. sample from the same day, time and location as the pollutant sample (e.g. metals).

Describe the solution you'd like

Use similar logic to the ID duplicate results between/within orgs functions and the TADA_PairReplicates function. Create a new function, TADA_FindPairedData.

  1. Group data by org, site, date, characteristics of interest (function would require a site input & characteristics of interest)
  2. Define main characteristic (pollutant such as a metal - only 1) and paired chars of interest (ph, T, DO, hardness, etc. - select multiple)
  3. Look within a user defined time window (default: 10 minutes). See example logic in TADA_PairReplicates
  4. Look within site (default) or include nearby sites (user defined area/radius). Leverage TADA_FindNearbySites

Reminders for TADA contributors addressing this issue

New features should include all of the following work:

hillarymarler commented 7 months ago

I have done similar work on Illinois data in the past (finding paired T, pH, hardness, etc. sample from the same day, time and location), but only from the same MonitoringLocation/date. I like the idea to leverage TADA_FindNearbySites()

hillarymarler commented 2 months ago

Once we've created TADA.MonitoringLocationIdentifier (which would then be modified after site review if the decision to treat two Monitoring Locations as one?), maybe we will not need to leverage TADA_FindNearbySItes() in this function?

hillarymarler commented 1 month ago

There needs to be an option to substitute a default value for hardness, pH, temperature.

hillarymarler commented 4 weeks ago

I've been working on this a bit. In the case of hardness, there are multiple characteristic names that correspond to hardness. In the current draft, users can rank the characteristic names so that if more than one is present and a possible pair for a result, the highest ranked one will be selected.

I am not sure what the default order of the ranking should be if the user fails to provide a ranking ref.

hillarymarler commented 2 weeks ago

I have a working draft of a pairing function in the demo_impairment_functions branch (https://github.com/USEPA/EPATADA/blob/demo_impairment_functions/R/Module3.R). Relevant code begins at line 282 and includes two functions, TADA_CreatePairRef and TADA_PairForCriteriaCalc.

I have not yet included a way to set a default value for missing values as I haven't figured out how to incorporate that yet.

hillarymarler commented 1 week ago

I moved this function to its own branch/BR (it is no longer in the demo_impairment_functions branch).