USEPA / EPATADA

This R package can be used to compile and evaluate Water Quality Portal (WQP) data for samples collected from surface water monitoring sites on streams and lakes. It can be used to create applications that support water quality programs and help states, tribes, and other stakeholders efficiently analyze the data.
https://usepa.github.io/EPATADA/
Creative Commons Zero v1.0 Universal
40 stars 18 forks source link

Efficient file storage and retrieval #367

Open cristinamullin opened 10 months ago

cristinamullin commented 10 months ago

Is your feature request related to a problem? Please describe.

The TADA package is large due to the way we are currently storing some of the internal files. Let's research and implement a better method to store and retrieve internal example data and reference files generated by WQXRefTables.R. However we may want to consider leaving the ref files in TADARefTables.R (NPsummation_key, HarmonizationTemplate, NP_equations) as .csv's so they are easy for our team to update (open to other suggestions).

12/14/23 check() note:

N checking installed package size (641ms) installed size is 26.4Mb sub-directories of 1Mb or more: data 4.7Mb doc 8.4Mb extdata 12.8Mb

cristinamullin commented 9 months ago

update: 1/12/24 check note:

❯ checking installed package size ... NOTE installed size is 28.1Mb sub-directories of 1Mb or more: data 4.9Mb doc 8.2Mb extdata 14.6Mb

JakevanDijk commented 8 months ago

Hi there, I've been trying to work on this recently and a friend of mine suggested trying to save the contents of the csv as a dataframe and then turning that dataframe into an rda. Does that sound like it makes sense? I wrote the below change for line 79 to try and do that: TADA_UpdateWQXCharValRef <- function() { TADA_GetWQXCharValRef <- read.csv("C:\Github TADA\TADA\inst\extdata\WQXcharValRef.csv") save(TADA_GetWQXCharValRef, file='TADA_UpdatedWQXCharValRef.rda') load(file='TADA_UpdatedWQXCharValRef.rda')}

Please let me know if it looks like I'm on the right path with this or totally barking up the wrong tree!

Best, Jake

cristinamullin commented 8 months ago

Hi Jake, Sorry for the delay. I just had a chance to dig into this.

It looks like that will work. I went though and made the changes for the example WQXCharValRef, see: https://github.com/USEPA/TADA/commit/b7ba0e34fb1ce044688790bf937166fcbdc06dbb

You'll see that a lot of other functions rely on the csv files. However, once we re-create these as rda files, then we will delete the csv's from the package so some other code needs to be updated to now reference the rda's. I did a search in the package for any reference to the WQXCharValRef.csv and then updated that code as well to be able to read in the new WQXCharValRef.rda file instead.

Let me know what you think. If this looks good, we just have to implement for the other files: WQXActivityTypeRef, WQXCharacteristicRef, WQXDetectionQuantitationLimitTypeRef, WQXMeasureQualifierCodeRef, WQXResultDetectionConditionRef, WQXunitRef, WQXMonitoringLocationTypeNameRef, WQXActivityRelativeDepthRef.

We will want to do some testing to make sure everything else still works after this change.