NIEHS / targets_PrestoGP

0 stars 0 forks source link

Handling `NA`s in covariate data #23

Open sigmafelix opened 4 months ago

sigmafelix commented 4 months ago
kyle-messier commented 4 months ago

@sigmafelix If a category exists at some location, but is non-existent in a given location, is it not a true zero?

sigmafelix commented 4 months ago

@Spatiotemporal-Exposures-and-Toxicology I think these values are unmeasured (unknown) than true zeros. For example, in the soil chemistry data, some locations got measurements of tens of elements while others only got a handful. The latter will have NAs in the fields of the elements measured in the former locations.

kyle-messier commented 4 months ago

@sigmafelix can you direct me to the source of the soil chemistry data generation?

sigmafelix commented 4 months ago

@Spatiotemporal-Exposures-and-Toxicology https://github.com/Spatiotemporal-Exposures-and-Toxicology/targets_PrestoGP/blob/54a0ae152b404692513eb6ce6ae8e4726e7e9a87/code/02_Geographic_Covariates/Calc_AZO_soilchemistry.r#L48-L65

Pivoting from a long to a wide table fills NAs.

sigmafelix commented 4 months ago

@Spatiotemporal-Exposures-and-Toxicology data_AZO_covariates.qs (./output in the project directory) contains HUC-8, -10, and -12 level terraClimate and PRISM covariates. Point based fields were removed. For terraClimate variables, we need to consider what fields should be summed or averaged. As the qs data file has all of these fields, we could remove sum/mean fields for certain variables. My suggestion for selecting variables is that--

kyle-messier commented 4 months ago

@sigmafelix I like your recommendations - I think we should keep it simple and only do sum or mean.

As for the point- there are no exact pixel extractions anymore? I'm good with that, but just wanted to check.

sigmafelix commented 4 months ago

@Spatiotemporal-Exposures-and-Toxicology For terraClimate and PRISM, there are no pixel extractions. Soil chemistry, aquifer (rock type), geology unit type, and pesticide estimates (county level) are extracted at point locations. NASS variables were converted to proportions. Per our decision, I excluded unnecessary terraClimate variables from the table and cleaned field names to align with the prefix table (data_AZO_covariates_prefixes.csv). The result is saved as data_AZO_covariates_cleaned_03032024.qs (all in ddn).