NIEHS / PrestoGP

Penalized Regression on Spatiotemporal Outcomes using Gaussian Processes a.k.a. PrestoGP
https://niehs.github.io/PrestoGP/
0 stars 0 forks source link

USGS Pesticide Estimates - covariate calculation #9

Open kyle-messier opened 11 months ago

sigmafelix commented 9 months ago

A couple of problems were identified:

(left: counties with partial availability; upper right: 2000 counties highlighted with counties of partial availability; lower right: 2015 counties highlighted with counties of partial availability) pesticides_county_nonpresence

kyle-messier commented 9 months ago

@sigmafelix Are these counties truly missing or just "not estimated"? Either way, could we proceed by assuming there is no estimate for pesticide usage in those counties? We could estimate those counties with our Kriging model, same as the plan for 2020-2022. Or shall we consider some other simplification?

sigmafelix commented 9 months ago

@Spatiotemporal-Exposures-and-Toxicology I think these values were not estimated. The initial plan was to reuse county polygons for all years, but it should be changed to use each year's polygon to calculate the weighted sum of estimates. I will assign zeros at these counties without estimates for now.

sigmafelix commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology

County code changes

Split county

Temporal trend

plot

sigmafelix commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology

A couple of problems were found in joining pesticide data to the EPA pesticide table.

1) Compound names in abbreviation need to get the full name. 2) When we get the appropriate full name, there are multiple subfamily of compounds that are classified as a group. This group name is used in the USGS county-level pesticide estimates.

I wish I had knowledge in chemistry to deal with this problem. What would be the best way to handle this problem?

# Non-joined compound names in USGS data
   COMPOUND.x          
   <chr>               
 1 2,4-D               
 2 2,4-DB              
 3 CARBARYL            
 4 CHLOROPICRIN        
 5 COPPER HYDROXIDE    
 6 COPPER OXYCHLORIDE  
 7 CYHALOTHRIN-LAMBDA  
 8 FLUAZIFOP           
 9 FLUMICLORAC         
10 INDOLYL-BUTYRIC ACID
# ℹ 132 more rows
# ℹ Use `print(n = ...)` to see more rows

## CARBARYL is an alternative name of CARBAMATE family
> chemlist %>% filter(grepl("CARBARYL", COMPOUND)) %>% .[[2]]
character(0)

## Trying CARBAMATE
> chemlist %>% filter(grepl("CARBAMATE", COMPOUND)) %>% .[[2]]
 [1] "o-(2-Propynyloxy)phenyl methylcarbamate"                                                                          
 [2] "6-(and 2)-Chloro-3,4-xylyl methylcarbamate"                                                                       
 [3] "m-(1-Ethylpropyl)phenyl methylcarbamate"                                                                          
 [4] "Phenylmercuric dimethyldithiocarbamate"                                                                           
 [5] "3-[(Methoxycarbonyl)amino]phenyl (1-chlorobutan-2-yl)carbamate"                                                   
 [6] "5,6,7,8-Tetrahydro-1-naphthyl methylcarbamate"                                                                    
 [7] "Mercuric dimethyl dithiocarbamate"                                                                                
 [8] "Sodium tetrathiocarbamate"                                                                                        
 [9] "o-Hydroxyphenyl methylcarbamate"                                                                                  
[10] "Potassium N-methyldithiocarbamate"                                                                                
[11] "1-Naphthalenol, 1-(N-methylcarbamate)"                                                                            
[12] "Potassium N-hydroxymethyl-N-methyldithiocarbamate"                                                                
[13] "2,3-Dichlorobenzyl methylcarbamate"                                                                               
[14] "Potassium dimethyldithiocarbamate"                                                                                
[15] "3-Iodo-2-propynyl-N-butylcarbamate"                                                                               
[16] "2-Chloro-4,5-dimethylphenyl (hydroxymethyl)carbamate"                                                             
[17] "3,5-Diisopropylphenyl methylcarbamate"                                                                            
[18] "Ammonium carbamate"                                                                                               
[19] "Methyl 5-hydroxy-2-benzimidazolecarbamate"                                                                        
[20] "m-Cumenyl methylcarbamate"                                                                                        
[21] "Ethyl N-cyclohexyl-N-ethylthiolcarbamate"                                                                         
[22] "2,3,5-Trimethylphenyl methylcarbamate"                                                                            
[23] "Calcium ethylenebis(dithiocarbamate)"                                                                             
[24] "1,6-Hexanediol dicarbamate"                                                                                       
[25] "tert-Butylsulfenyl dimethyldithiocarbamate"                                                                       
[26] "1,3-Dimethyl-1,1,3,3-disiloxanetetrol-1,3-bis(dimethylthiocarbamate)"                                             
[27] "3,4,5-Trimethylphenyl methylcarbamate"                                                                            
[28] "2-Cyclopentylphenyl methylcarbamate"                                                                              
[29] "1,1-Dimethylethyl N-(6-((((Z)-((1-methyl-1H-tetrazol-5-yl)phenylmethylene)amino)oxy)methyl)-2-pyridinyl)carbamate"
[30] "Sodium dimethyldithiocarbamate"                                                                                   
[31] "3-Methyl-4-(methylthio)phenyl methylcarbamate"                                                                    
[32] "6-Methyl-2-propyl-4-pyrimidinyl dimethylcarbamate"                                                                
[33] "4-Chlorophenyl methylcarbamate"                                                                                   
[34] "4-(Methylamino)-3,5-xylyl N- methylcarbamate"                                                                     
[35] "2-Chloro-4,5-xylyl N-hydroxy-N-methylcarbamate"                                                                   
[36] "2-Chloro-4,5-xylyl carbamate"
## 36 CARBAMATEs! what is the most general form of it to measure the dissimilarity?

## used NITROCHLOROFORM, the general name of CHLOROPICRIN                                                                                     
> chemlist %>% filter(grepl("NITROCHLOROFORM", COMPOUND)) %>% .[[2]]
character(0)
# No record in the EPA list

## Is this exactly the same as we want to query, CHLOROPICRIN?
> chemlist %>% filter(grepl("NITROCHLORO", COMPOUND)) %>% .[[2]]
[1] "2,6-Dinitrochlorobenzene"

> chemlist %>% filter(grepl("INDOLYL", COMPOUND)) %>% .[[2]]
character(0)
## INDOLYL turns out to be a INDOLE-3.

## What if we query BUTYRIC?
> chemlist %>% filter(grepl("BUTYRIC", COMPOUND)) %>% .[[2]]
[1] "2-Ethylbutyric acid"             "Indole-3-butyric acid"          
[3] "4-Hydroxybutyric acid"           "DL-2-Hydroxybutyric acid"       
[5] "D-3-Hydroxybutyric acid"         "2,4-Dichlorophenoxybutyric acid"

## What if we query INDOLE-3?
> chemlist %>% filter(grepl("INDOLE-", COMPOUND)) %>% .[[2]]
[1] "Indole-3-pyruvic acid"                       
[2] "Indole-3-butyric acid"                       
[3] "1-Indole-3-butanethioic acid, S-phenyl ester"
[4] "Indole-3-acetic acid" 
## In this case, we can use Indole-3-butyric acid.
sigmafelix commented 8 months ago

Following the meeting just now, I will look at the CompTox/GenRA for further works for determining dis/similarity. Many compound names need to be corrected to ensure that there is no missed records.

kyle-messier commented 8 months ago

@sigmafelix I could do this as well, but if you want to please go ahead. I was inputing the three main pesticides of interest (Atrazine, Simazine, Propazine), then getting 1 or 2 of the different groupings - one purely chemical based and one toxicity based. Then take the top 10 or 20 among those 3 that are also included in the USGS pesticide usage data. For that, we may have to look at synonyms of the usage data to see if those chemicals are in that database.

kyle-messier commented 8 months ago

@sigmafelix I uploaded 6 csv in PrestoGP/input/RA_Chemical_Similiarity. 2 files each for Atrazine, Simazine, and Propazine. 1 file is the physical chemical similiarity, and the other is a similarity w.r.t. multiple toxicity endpoints. The former appears to give more structurally similar chemicals and likely won't result in many other pesticides in comomn with the USGS data. The latter does result in quite a diverse array of chemicals and I noticed does have overlap with the USGS data. So, at a minimum we want covariates for the 3 main pesticides, and I'd say 10 max other ones based on these datasets. Let me know if you want to discuss further.

sigmafelix commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology Name matching is in progress. Eight pesticides were not matched with USGS pesticide names. The chemical/toxic similarity included non-pesticide compounds, so we might want to exclude them. Details are in Teams chat.

sigmafelix commented 8 months ago

Following our quick meeting--

TODO

sigmafelix commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology

10/30/2023 Update

── Variable type: numeric ──────────────────────────────────────────────────────
   skim_variable                 n_missing complete_rate     mean       sd
 1 YEAR                                  0       1        2010.       5.77
 2 EPEST_LOW_KG_2,4-D                  782       0.987    5171.    8740.  
 3 EPEST_LOW_KG_ACEPHATE             32880       0.464    1075.    4098.  
 4 EPEST_LOW_KG_ACIFLUORFEN          49375       0.195     246.     618.  
 5 EPEST_LOW_KG_ATRAZINE              5651       0.908   11698.   21326.  
 6 EPEST_LOW_KG_CLETHODIM            19894       0.676     158.     462.  
 7 EPEST_LOW_KG_CLOMAZONE            28993       0.527     246.    1188.  
 8 EPEST_LOW_KG_FLUOMETURON          53458       0.128     887.    2174.  
 9 EPEST_LOW_KG_LACTOFEN             49231       0.197     156.     340.  
10 EPEST_LOW_KG_SETHOXYDIM           26281       0.571      63.0    277.  
11 EPEST_LOW_KG_SIMAZINE             28777       0.531    1134.    4042.  
12 EPEST_LOW_KG_TEBUFENOZIDE         56321       0.0815     74.2    446.  
13 EPEST_LOW_KG_INDOXACARB           50786       0.172      49.7    349.  
14 EPEST_LOW_KG_PROSULFURON          50862       0.171      13.1     35.8 
15 EPEST_LOW_KG_THIAMETHOXAM         22437       0.634     121.     266.  
16 EPEST_LOW_KG_TRIADIMENOL          59046       0.0371      4.93    14.6 
17 EPEST_LOW_KG_MYCLOBUTANIL         27603       0.550      32.4    263.  
18 EPEST_LOW_KG_DIFENOCONAZOLE       42118       0.313      58.4    219.  
19 EPEST_LOW_KG_DIMETHENAMID-P       36445       0.406    1124.    2572.  
20 EPEST_LOW_KG_CYMOXANIL            45610       0.256      27.3    127.  
21 EPEST_LOW_KG_PYRIMETHANIL         51831       0.155      97.3    404.  
22 EPEST_LOW_KG_ZIRAM                38712       0.369     562.    3752.  
23 EPEST_LOW_KG_FLURIDONE            61188       0.00215    62.6    108.  
24 EPEST_LOW_KG_FLUFENACET           54379       0.113     453.     695.  
25 EPEST_LOW_KG_ISOXAFLUTOLE         46066       0.249     223.     318.  
26 EPEST_LOW_KG_DAZOMET              61168       0.00248   235.     610.  
27 EPEST_LOW_KG_FORMETANATE          59512       0.0295    265.     929.  
28 EPEST_LOW_KG_BROMACIL             60025       0.0211   2186.    5887.  
29 EPEST_LOW_KG_TRIASULFURON         54388       0.113      40.5     82.7 
30 EPEST_LOW_KG_FLUVALINATE-TAU      60864       0.00744    13.9     26.0 
31 EPEST_LOW_KG_CPPU                 61050       0.00440     4.21    12.1 
32 EPEST_HIGH_KG_2,4-D                 291       0.995    5606.    8891.  
33 EPEST_HIGH_KG_ACEPHATE            26363       0.570    1160.    4373.  
34 EPEST_HIGH_KG_ACIFLUORFEN         30165       0.508     166.     441.  
35 EPEST_HIGH_KG_ATRAZINE             3171       0.948   11464.   21015.  
36 EPEST_HIGH_KG_CLETHODIM            8221       0.866     168.     441.  
37 EPEST_HIGH_KG_CLOMAZONE           23553       0.616     273.    1200.  
38 EPEST_HIGH_KG_FLUOMETURON         48433       0.210     690.    1858.  
39 EPEST_HIGH_KG_LACTOFEN            29443       0.520     108.     267.  
40 EPEST_HIGH_KG_SETHOXYDIM          14530       0.763      90.4    273.  
41 EPEST_HIGH_KG_SIMAZINE            14603       0.762    1423.    3608.  
42 EPEST_HIGH_KG_TEBUFENOZIDE        52501       0.144      48.7    342.  
43 EPEST_HIGH_KG_INDOXACARB          40985       0.332      44.0    279.  
44 EPEST_HIGH_KG_PROSULFURON         27534       0.551      11.0     33.3 
45 EPEST_HIGH_KG_THIAMETHOXAM        13655       0.777     112.     252.  
46 EPEST_HIGH_KG_TRIADIMENOL         57924       0.0554      3.76    12.2 
47 EPEST_HIGH_KG_MYCLOBUTANIL        23335       0.619      31.5    248.  
48 EPEST_HIGH_KG_DIFENOCONAZOLE      35486       0.421      59.8    201.  
49 EPEST_HIGH_KG_DIMETHENAMID-P      22439       0.634    1072.    2306.  
50 EPEST_HIGH_KG_CYMOXANIL           39170       0.361      27.6    128.  
51 EPEST_HIGH_KG_PYRIMETHANIL        47998       0.217      86.2    367.  
52 EPEST_HIGH_KG_ZIRAM               33559       0.453     517.    3439.  
53 EPEST_HIGH_KG_FLURIDONE           60770       0.00897   109.     249.  
54 EPEST_HIGH_KG_FLUFENACET          39200       0.361     339.     607.  
55 EPEST_HIGH_KG_ISOXAFLUTOLE        27731       0.548     136.     251.  
56 EPEST_HIGH_KG_DAZOMET             61168       0.00248   235.     610.  
57 EPEST_HIGH_KG_FORMETANATE         58530       0.0455    206.     844.  
58 EPEST_HIGH_KG_BROMACIL            59760       0.0254   1853.    5427.  
59 EPEST_HIGH_KG_TRIASULFURON        41348       0.326      27.6     65.1 
60 EPEST_HIGH_KG_FLUVALINATE-TAU     60864       0.00744    13.9     26.0 
61 EPEST_HIGH_KG_CPPU                60960       0.00587     3.16    10.6 
       p0      p25     p50       p75    p100 hist 
 1 2000   2005.    2010.    2014.      2019  ▇▇▇▇▇
 2    0    498.    2168.    6149.    298230. ▇▁▁▁▁
 3    0      4.6     64      500.    136226. ▇▁▁▁▁
 4    0      9.8     57.3    226.     15113  ▇▁▁▁▁
 5    0    272.    2448.   15204.    768661. ▇▁▁▁▁
 6    0      1.6     19      127.     17726. ▇▁▁▁▁
 7    0      1        6.3     41      22178  ▇▁▁▁▁
 8    0.1   40.5    178.     697.     38672. ▇▁▁▁▁
 9    0      5.4     34.7    145.      7052. ▇▁▁▁▁
10    0      0.5      2.7     21.8     7783  ▇▁▁▁▁
11    0      6.1     63.2    612.    143651. ▇▁▁▁▁
12    0      0.6      2.9     16.1    13340. ▇▁▁▁▁
13    0      0.2      1        8.4    13552. ▇▁▁▁▁
14    0      0.8      3.6     11.8     1050. ▇▁▁▁▁
15    0      1.8     18.4    121.      5919. ▇▁▁▁▁
16    0      0.2      0.6      3.1      290. ▇▁▁▁▁
17    0      0.4      1.5      5.6    21753. ▇▁▁▁▁
18    0      0.6      3.7     22.5     6381. ▇▁▁▁▁
19    0     11.9    172     1098.     57032  ▇▁▁▁▁
20    0      0.2      1.1      7.8     4880. ▇▁▁▁▁
21    0      0.5      3.5     23.7     8133. ▇▁▁▁▁
22    0      7.2     22.8     79.2   125181. ▇▁▁▁▁
23    0.1    3.65    24.1     68        617. ▇▁▁▁▁
24    0     48.3    205.     581.     12431. ▇▁▁▁▁
25    0     13.6     93      308.      3897. ▇▁▁▁▁
26    0.1    4.5     37.1    178.      4916. ▇▁▁▁▁
27    0      0.7      3.4     38.1     8094. ▇▁▁▁▁
28    0      5.1     84.9   1339.     63154. ▇▁▁▁▁
29    0      1.7      9.05    39.6     1206. ▇▁▁▁▁
30    0      0.775    3.75    13.7      174. ▇▁▁▁▁
31    0      0        0.2      1.7      108. ▇▁▁▁▁
32    0    792.    2599     6718.    298231. ▇▁▁▁▁
33    0      7.2     87.7    568.    137713. ▇▁▁▁▁
34    0      8.1     43.5    146.     15119. ▇▁▁▁▁
35    0    358.    2474.   14407.    768661. ▇▁▁▁▁
36    0      5.8     36.6    152.     17777. ▇▁▁▁▁
37    0      1.9     12.7     94.3    23750. ▇▁▁▁▁
38    0     27.6    124.     499.     38672. ▇▁▁▁▁
39    0      2.9     20.1     88.5     7052. ▇▁▁▁▁
40    0      2.1     16.3     76.7     7795  ▇▁▁▁▁
41    0     57.6    333.    1408.    143671. ▇▁▁▁▁
42    0      0.5      2.3     11.3    13340. ▇▁▁▁▁
43    0      0.4      2.3     16      13552. ▇▁▁▁▁
44    0      0.7      3.6     10.6     1510. ▇▁▁▁▁
45    0      2.5     19.9    104.      5969. ▇▁▁▁▁
46    0      0.1      0.4      2.1      290. ▇▁▁▁▁
47    0      0.5      1.9      7.2    21753. ▇▁▁▁▁
48    0      1.1      6.9     35.6     6381. ▇▁▁▁▁
49    0     33.4    276.    1123.     57032  ▇▁▁▁▁
50    0      0.3      1.7      9.9     5238. ▇▁▁▁▁
51    0      0.6      4.1     23.2     8133. ▇▁▁▁▁
52    0      9.4     28.6    101.    125181. ▇▁▁▁▁
53    0      5.95    28.4    102.      2255. ▇▁▁▁▁
54    0     22.2    138.     423.     12431. ▇▁▁▁▁
55    0      4.3     29.9    150.      4830. ▇▁▁▁▁
56    0.1    4.5     37.1    178.      4916. ▇▁▁▁▁
57    0      0.8      4.1     26.4    16439. ▇▁▁▁▁
58    0      6.2     53.9    844.     63154. ▇▁▁▁▁
59    0      1        5.1     23.1     1233. ▇▁▁▁▁
60    0      0.775    3.75    13.7      174. ▇▁▁▁▁
61    0      0        0.1      0.925    108. ▇▁▁▁▁
sigmafelix commented 8 months ago

@Spatiotemporal-Exposures-and-Toxicology Following our discussion, I replaced all NA values to zeros and selected 20 pesticides by the rank of the all-year sums of "high estimate". These sums are joined with the existing covariate data frame. I also reflected changes in metadata then saved the covariates and the metadata in the ddn location (./output/Covariate_Calculated). As a side note, I put some interesting characteristics of each pesticide below:

   COMPOUND       p_zero_total  total_low total_high
   <chr>                 <dbl>      <dbl>      <dbl>
 1 ATRAZINE            0.0722  651199182. 666631897.
 2 2,4-D               0.00891 313036679. 342146162.
 3 SIMAZINE            0.355    36903429.  66457748.
 4 DIMETHENAMID-P      0.482    27966762.  41691140.
 5 ACEPHATE            0.484    30569940.  40555491.
 6 ZIRAM               0.590    12715817.  14352861.
 7 CLOMAZONE           0.433     7968003.  10309491.
 8 CLETHODIM           0.238     6529587.   8931624.
 9 FLUOMETURON         0.831     6973212.   8892368.
10 FLUFENACET          0.763     3145957.   7490557.
11 THIAMETHOXAM        0.309     4709930.   5339280.
12 ACIFLUORFEN         0.650     2933474.   5167954.
13 ISOXAFLUTOLE        0.603     3405836.   4564792.
14 SETHOXYDIM          0.343     2209091.   4229426.
15 LACTOFEN            0.645     1882164.   3434971.
16 BROMACIL            0.977     2830814.   2890766.
17 DIFENOCONAZOLE      0.642     1122100.   1544372.
18 MYCLOBUTANIL        0.423     1093562.   1197274.
19 PYRIMETHANIL        0.820      923373.   1148035.
20 INDOXACARB          0.756      523425.    893810.

## High total usage does not necessarily correlates with the low zero rate
## Zero rate is calculated as (|zeros in low|+|zeros in high|)/(|low|+|high|)
## According to the EPA pesticide fact sheets:
## Ziram: fungicide for stone fruits, pome fruits, nut crops, vegetables, and
##    commercially grown ornamentals; rabbit repellents (59.0% zeros, 6th)
## Fluometuron: cotton (83.1% zeros, 9th)
## Flufenacet: corn and soybeans (76.3% zeros, 10th)
## Bromacil: citrus and pineapple (97.7% zeros, 16th) -- Heavily used in Florida 🍊
## Pyrimethanil: almonds, pome fruit, citrus fruit, stone fruit, bananas, 
##   grapes, onions, pistachios, strawberries, tomatoes, 
##    tuberous vegetables (82.0% zeros, 19th)
## Indoxacarb: insecticide for apples, pears, Brassica [cabbages], sweet corn,
##   lettuce, and fruiting vegetables (76.0% zeros, 20th)
sigmafelix commented 8 months ago

The covariate calculation codes are placed in PrestoGP_Pesticides repository for now. I will work on migrating the codes into directories following our provisional directory structure mentioned in #14