Open kyle-messier opened 11 months ago
@sigmafelix Are these counties truly missing or just "not estimated"? Either way, could we proceed by assuming there is no estimate for pesticide usage in those counties? We could estimate those counties with our Kriging model, same as the plan for 2020-2022. Or shall we consider some other simplification?
@Spatiotemporal-Exposures-and-Toxicology I think these values were not estimated. The initial plan was to reuse county polygons for all years, but it should be changed to use each year's polygon to calculate the weighted sum of estimates. I will assign zeros at these counties without estimates for now.
@Spatiotemporal-Exposures-and-Toxicology
@Spatiotemporal-Exposures-and-Toxicology
A couple of problems were found in joining pesticide data to the EPA pesticide table.
1) Compound names in abbreviation need to get the full name. 2) When we get the appropriate full name, there are multiple subfamily of compounds that are classified as a group. This group name is used in the USGS county-level pesticide estimates.
I wish I had knowledge in chemistry to deal with this problem. What would be the best way to handle this problem?
# Non-joined compound names in USGS data
COMPOUND.x
<chr>
1 2,4-D
2 2,4-DB
3 CARBARYL
4 CHLOROPICRIN
5 COPPER HYDROXIDE
6 COPPER OXYCHLORIDE
7 CYHALOTHRIN-LAMBDA
8 FLUAZIFOP
9 FLUMICLORAC
10 INDOLYL-BUTYRIC ACID
# ℹ 132 more rows
# ℹ Use `print(n = ...)` to see more rows
## CARBARYL is an alternative name of CARBAMATE family
> chemlist %>% filter(grepl("CARBARYL", COMPOUND)) %>% .[[2]]
character(0)
## Trying CARBAMATE
> chemlist %>% filter(grepl("CARBAMATE", COMPOUND)) %>% .[[2]]
[1] "o-(2-Propynyloxy)phenyl methylcarbamate"
[2] "6-(and 2)-Chloro-3,4-xylyl methylcarbamate"
[3] "m-(1-Ethylpropyl)phenyl methylcarbamate"
[4] "Phenylmercuric dimethyldithiocarbamate"
[5] "3-[(Methoxycarbonyl)amino]phenyl (1-chlorobutan-2-yl)carbamate"
[6] "5,6,7,8-Tetrahydro-1-naphthyl methylcarbamate"
[7] "Mercuric dimethyl dithiocarbamate"
[8] "Sodium tetrathiocarbamate"
[9] "o-Hydroxyphenyl methylcarbamate"
[10] "Potassium N-methyldithiocarbamate"
[11] "1-Naphthalenol, 1-(N-methylcarbamate)"
[12] "Potassium N-hydroxymethyl-N-methyldithiocarbamate"
[13] "2,3-Dichlorobenzyl methylcarbamate"
[14] "Potassium dimethyldithiocarbamate"
[15] "3-Iodo-2-propynyl-N-butylcarbamate"
[16] "2-Chloro-4,5-dimethylphenyl (hydroxymethyl)carbamate"
[17] "3,5-Diisopropylphenyl methylcarbamate"
[18] "Ammonium carbamate"
[19] "Methyl 5-hydroxy-2-benzimidazolecarbamate"
[20] "m-Cumenyl methylcarbamate"
[21] "Ethyl N-cyclohexyl-N-ethylthiolcarbamate"
[22] "2,3,5-Trimethylphenyl methylcarbamate"
[23] "Calcium ethylenebis(dithiocarbamate)"
[24] "1,6-Hexanediol dicarbamate"
[25] "tert-Butylsulfenyl dimethyldithiocarbamate"
[26] "1,3-Dimethyl-1,1,3,3-disiloxanetetrol-1,3-bis(dimethylthiocarbamate)"
[27] "3,4,5-Trimethylphenyl methylcarbamate"
[28] "2-Cyclopentylphenyl methylcarbamate"
[29] "1,1-Dimethylethyl N-(6-((((Z)-((1-methyl-1H-tetrazol-5-yl)phenylmethylene)amino)oxy)methyl)-2-pyridinyl)carbamate"
[30] "Sodium dimethyldithiocarbamate"
[31] "3-Methyl-4-(methylthio)phenyl methylcarbamate"
[32] "6-Methyl-2-propyl-4-pyrimidinyl dimethylcarbamate"
[33] "4-Chlorophenyl methylcarbamate"
[34] "4-(Methylamino)-3,5-xylyl N- methylcarbamate"
[35] "2-Chloro-4,5-xylyl N-hydroxy-N-methylcarbamate"
[36] "2-Chloro-4,5-xylyl carbamate"
## 36 CARBAMATEs! what is the most general form of it to measure the dissimilarity?
## used NITROCHLOROFORM, the general name of CHLOROPICRIN
> chemlist %>% filter(grepl("NITROCHLOROFORM", COMPOUND)) %>% .[[2]]
character(0)
# No record in the EPA list
## Is this exactly the same as we want to query, CHLOROPICRIN?
> chemlist %>% filter(grepl("NITROCHLORO", COMPOUND)) %>% .[[2]]
[1] "2,6-Dinitrochlorobenzene"
> chemlist %>% filter(grepl("INDOLYL", COMPOUND)) %>% .[[2]]
character(0)
## INDOLYL turns out to be a INDOLE-3.
## What if we query BUTYRIC?
> chemlist %>% filter(grepl("BUTYRIC", COMPOUND)) %>% .[[2]]
[1] "2-Ethylbutyric acid" "Indole-3-butyric acid"
[3] "4-Hydroxybutyric acid" "DL-2-Hydroxybutyric acid"
[5] "D-3-Hydroxybutyric acid" "2,4-Dichlorophenoxybutyric acid"
## What if we query INDOLE-3?
> chemlist %>% filter(grepl("INDOLE-", COMPOUND)) %>% .[[2]]
[1] "Indole-3-pyruvic acid"
[2] "Indole-3-butyric acid"
[3] "1-Indole-3-butanethioic acid, S-phenyl ester"
[4] "Indole-3-acetic acid"
## In this case, we can use Indole-3-butyric acid.
Following the meeting just now, I will look at the CompTox/GenRA for further works for determining dis/similarity. Many compound names need to be corrected to ensure that there is no missed records.
@sigmafelix I could do this as well, but if you want to please go ahead. I was inputing the three main pesticides of interest (Atrazine, Simazine, Propazine), then getting 1 or 2 of the different groupings - one purely chemical based and one toxicity based. Then take the top 10 or 20 among those 3 that are also included in the USGS pesticide usage data. For that, we may have to look at synonyms of the usage data to see if those chemicals are in that database.
@sigmafelix I uploaded 6 csv in PrestoGP/input/RA_Chemical_Similiarity. 2 files each for Atrazine, Simazine, and Propazine. 1 file is the physical chemical similiarity, and the other is a similarity w.r.t. multiple toxicity endpoints. The former appears to give more structurally similar chemicals and likely won't result in many other pesticides in comomn with the USGS data. The latter does result in quite a diverse array of chemicals and I noticed does have overlap with the USGS data. So, at a minimum we want covariates for the 3 main pesticides, and I'd say 10 max other ones based on these datasets. Let me know if you want to discuss further.
@Spatiotemporal-Exposures-and-Toxicology Name matching is in progress. Eight pesticides were not matched with USGS pesticide names. The chemical/toxic similarity included non-pesticide compounds, so we might want to exclude them. Details are in Teams chat.
Following our quick meeting--
@Spatiotemporal-Exposures-and-Toxicology
── Variable type: numeric ──────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd
1 YEAR 0 1 2010. 5.77
2 EPEST_LOW_KG_2,4-D 782 0.987 5171. 8740.
3 EPEST_LOW_KG_ACEPHATE 32880 0.464 1075. 4098.
4 EPEST_LOW_KG_ACIFLUORFEN 49375 0.195 246. 618.
5 EPEST_LOW_KG_ATRAZINE 5651 0.908 11698. 21326.
6 EPEST_LOW_KG_CLETHODIM 19894 0.676 158. 462.
7 EPEST_LOW_KG_CLOMAZONE 28993 0.527 246. 1188.
8 EPEST_LOW_KG_FLUOMETURON 53458 0.128 887. 2174.
9 EPEST_LOW_KG_LACTOFEN 49231 0.197 156. 340.
10 EPEST_LOW_KG_SETHOXYDIM 26281 0.571 63.0 277.
11 EPEST_LOW_KG_SIMAZINE 28777 0.531 1134. 4042.
12 EPEST_LOW_KG_TEBUFENOZIDE 56321 0.0815 74.2 446.
13 EPEST_LOW_KG_INDOXACARB 50786 0.172 49.7 349.
14 EPEST_LOW_KG_PROSULFURON 50862 0.171 13.1 35.8
15 EPEST_LOW_KG_THIAMETHOXAM 22437 0.634 121. 266.
16 EPEST_LOW_KG_TRIADIMENOL 59046 0.0371 4.93 14.6
17 EPEST_LOW_KG_MYCLOBUTANIL 27603 0.550 32.4 263.
18 EPEST_LOW_KG_DIFENOCONAZOLE 42118 0.313 58.4 219.
19 EPEST_LOW_KG_DIMETHENAMID-P 36445 0.406 1124. 2572.
20 EPEST_LOW_KG_CYMOXANIL 45610 0.256 27.3 127.
21 EPEST_LOW_KG_PYRIMETHANIL 51831 0.155 97.3 404.
22 EPEST_LOW_KG_ZIRAM 38712 0.369 562. 3752.
23 EPEST_LOW_KG_FLURIDONE 61188 0.00215 62.6 108.
24 EPEST_LOW_KG_FLUFENACET 54379 0.113 453. 695.
25 EPEST_LOW_KG_ISOXAFLUTOLE 46066 0.249 223. 318.
26 EPEST_LOW_KG_DAZOMET 61168 0.00248 235. 610.
27 EPEST_LOW_KG_FORMETANATE 59512 0.0295 265. 929.
28 EPEST_LOW_KG_BROMACIL 60025 0.0211 2186. 5887.
29 EPEST_LOW_KG_TRIASULFURON 54388 0.113 40.5 82.7
30 EPEST_LOW_KG_FLUVALINATE-TAU 60864 0.00744 13.9 26.0
31 EPEST_LOW_KG_CPPU 61050 0.00440 4.21 12.1
32 EPEST_HIGH_KG_2,4-D 291 0.995 5606. 8891.
33 EPEST_HIGH_KG_ACEPHATE 26363 0.570 1160. 4373.
34 EPEST_HIGH_KG_ACIFLUORFEN 30165 0.508 166. 441.
35 EPEST_HIGH_KG_ATRAZINE 3171 0.948 11464. 21015.
36 EPEST_HIGH_KG_CLETHODIM 8221 0.866 168. 441.
37 EPEST_HIGH_KG_CLOMAZONE 23553 0.616 273. 1200.
38 EPEST_HIGH_KG_FLUOMETURON 48433 0.210 690. 1858.
39 EPEST_HIGH_KG_LACTOFEN 29443 0.520 108. 267.
40 EPEST_HIGH_KG_SETHOXYDIM 14530 0.763 90.4 273.
41 EPEST_HIGH_KG_SIMAZINE 14603 0.762 1423. 3608.
42 EPEST_HIGH_KG_TEBUFENOZIDE 52501 0.144 48.7 342.
43 EPEST_HIGH_KG_INDOXACARB 40985 0.332 44.0 279.
44 EPEST_HIGH_KG_PROSULFURON 27534 0.551 11.0 33.3
45 EPEST_HIGH_KG_THIAMETHOXAM 13655 0.777 112. 252.
46 EPEST_HIGH_KG_TRIADIMENOL 57924 0.0554 3.76 12.2
47 EPEST_HIGH_KG_MYCLOBUTANIL 23335 0.619 31.5 248.
48 EPEST_HIGH_KG_DIFENOCONAZOLE 35486 0.421 59.8 201.
49 EPEST_HIGH_KG_DIMETHENAMID-P 22439 0.634 1072. 2306.
50 EPEST_HIGH_KG_CYMOXANIL 39170 0.361 27.6 128.
51 EPEST_HIGH_KG_PYRIMETHANIL 47998 0.217 86.2 367.
52 EPEST_HIGH_KG_ZIRAM 33559 0.453 517. 3439.
53 EPEST_HIGH_KG_FLURIDONE 60770 0.00897 109. 249.
54 EPEST_HIGH_KG_FLUFENACET 39200 0.361 339. 607.
55 EPEST_HIGH_KG_ISOXAFLUTOLE 27731 0.548 136. 251.
56 EPEST_HIGH_KG_DAZOMET 61168 0.00248 235. 610.
57 EPEST_HIGH_KG_FORMETANATE 58530 0.0455 206. 844.
58 EPEST_HIGH_KG_BROMACIL 59760 0.0254 1853. 5427.
59 EPEST_HIGH_KG_TRIASULFURON 41348 0.326 27.6 65.1
60 EPEST_HIGH_KG_FLUVALINATE-TAU 60864 0.00744 13.9 26.0
61 EPEST_HIGH_KG_CPPU 60960 0.00587 3.16 10.6
p0 p25 p50 p75 p100 hist
1 2000 2005. 2010. 2014. 2019 ▇▇▇▇▇
2 0 498. 2168. 6149. 298230. ▇▁▁▁▁
3 0 4.6 64 500. 136226. ▇▁▁▁▁
4 0 9.8 57.3 226. 15113 ▇▁▁▁▁
5 0 272. 2448. 15204. 768661. ▇▁▁▁▁
6 0 1.6 19 127. 17726. ▇▁▁▁▁
7 0 1 6.3 41 22178 ▇▁▁▁▁
8 0.1 40.5 178. 697. 38672. ▇▁▁▁▁
9 0 5.4 34.7 145. 7052. ▇▁▁▁▁
10 0 0.5 2.7 21.8 7783 ▇▁▁▁▁
11 0 6.1 63.2 612. 143651. ▇▁▁▁▁
12 0 0.6 2.9 16.1 13340. ▇▁▁▁▁
13 0 0.2 1 8.4 13552. ▇▁▁▁▁
14 0 0.8 3.6 11.8 1050. ▇▁▁▁▁
15 0 1.8 18.4 121. 5919. ▇▁▁▁▁
16 0 0.2 0.6 3.1 290. ▇▁▁▁▁
17 0 0.4 1.5 5.6 21753. ▇▁▁▁▁
18 0 0.6 3.7 22.5 6381. ▇▁▁▁▁
19 0 11.9 172 1098. 57032 ▇▁▁▁▁
20 0 0.2 1.1 7.8 4880. ▇▁▁▁▁
21 0 0.5 3.5 23.7 8133. ▇▁▁▁▁
22 0 7.2 22.8 79.2 125181. ▇▁▁▁▁
23 0.1 3.65 24.1 68 617. ▇▁▁▁▁
24 0 48.3 205. 581. 12431. ▇▁▁▁▁
25 0 13.6 93 308. 3897. ▇▁▁▁▁
26 0.1 4.5 37.1 178. 4916. ▇▁▁▁▁
27 0 0.7 3.4 38.1 8094. ▇▁▁▁▁
28 0 5.1 84.9 1339. 63154. ▇▁▁▁▁
29 0 1.7 9.05 39.6 1206. ▇▁▁▁▁
30 0 0.775 3.75 13.7 174. ▇▁▁▁▁
31 0 0 0.2 1.7 108. ▇▁▁▁▁
32 0 792. 2599 6718. 298231. ▇▁▁▁▁
33 0 7.2 87.7 568. 137713. ▇▁▁▁▁
34 0 8.1 43.5 146. 15119. ▇▁▁▁▁
35 0 358. 2474. 14407. 768661. ▇▁▁▁▁
36 0 5.8 36.6 152. 17777. ▇▁▁▁▁
37 0 1.9 12.7 94.3 23750. ▇▁▁▁▁
38 0 27.6 124. 499. 38672. ▇▁▁▁▁
39 0 2.9 20.1 88.5 7052. ▇▁▁▁▁
40 0 2.1 16.3 76.7 7795 ▇▁▁▁▁
41 0 57.6 333. 1408. 143671. ▇▁▁▁▁
42 0 0.5 2.3 11.3 13340. ▇▁▁▁▁
43 0 0.4 2.3 16 13552. ▇▁▁▁▁
44 0 0.7 3.6 10.6 1510. ▇▁▁▁▁
45 0 2.5 19.9 104. 5969. ▇▁▁▁▁
46 0 0.1 0.4 2.1 290. ▇▁▁▁▁
47 0 0.5 1.9 7.2 21753. ▇▁▁▁▁
48 0 1.1 6.9 35.6 6381. ▇▁▁▁▁
49 0 33.4 276. 1123. 57032 ▇▁▁▁▁
50 0 0.3 1.7 9.9 5238. ▇▁▁▁▁
51 0 0.6 4.1 23.2 8133. ▇▁▁▁▁
52 0 9.4 28.6 101. 125181. ▇▁▁▁▁
53 0 5.95 28.4 102. 2255. ▇▁▁▁▁
54 0 22.2 138. 423. 12431. ▇▁▁▁▁
55 0 4.3 29.9 150. 4830. ▇▁▁▁▁
56 0.1 4.5 37.1 178. 4916. ▇▁▁▁▁
57 0 0.8 4.1 26.4 16439. ▇▁▁▁▁
58 0 6.2 53.9 844. 63154. ▇▁▁▁▁
59 0 1 5.1 23.1 1233. ▇▁▁▁▁
60 0 0.775 3.75 13.7 174. ▇▁▁▁▁
61 0 0 0.1 0.925 108. ▇▁▁▁▁
@Spatiotemporal-Exposures-and-Toxicology
Following our discussion, I replaced all NA
values to zeros and selected 20 pesticides by the rank of the all-year sums of "high estimate". These sums are joined with the existing covariate data frame. I also reflected changes in metadata then saved the covariates and the metadata in the ddn location (./output/Covariate_Calculated). As a side note, I put some interesting characteristics of each pesticide below:
COMPOUND p_zero_total total_low total_high
<chr> <dbl> <dbl> <dbl>
1 ATRAZINE 0.0722 651199182. 666631897.
2 2,4-D 0.00891 313036679. 342146162.
3 SIMAZINE 0.355 36903429. 66457748.
4 DIMETHENAMID-P 0.482 27966762. 41691140.
5 ACEPHATE 0.484 30569940. 40555491.
6 ZIRAM 0.590 12715817. 14352861.
7 CLOMAZONE 0.433 7968003. 10309491.
8 CLETHODIM 0.238 6529587. 8931624.
9 FLUOMETURON 0.831 6973212. 8892368.
10 FLUFENACET 0.763 3145957. 7490557.
11 THIAMETHOXAM 0.309 4709930. 5339280.
12 ACIFLUORFEN 0.650 2933474. 5167954.
13 ISOXAFLUTOLE 0.603 3405836. 4564792.
14 SETHOXYDIM 0.343 2209091. 4229426.
15 LACTOFEN 0.645 1882164. 3434971.
16 BROMACIL 0.977 2830814. 2890766.
17 DIFENOCONAZOLE 0.642 1122100. 1544372.
18 MYCLOBUTANIL 0.423 1093562. 1197274.
19 PYRIMETHANIL 0.820 923373. 1148035.
20 INDOXACARB 0.756 523425. 893810.
## High total usage does not necessarily correlates with the low zero rate
## Zero rate is calculated as (|zeros in low|+|zeros in high|)/(|low|+|high|)
## According to the EPA pesticide fact sheets:
## Ziram: fungicide for stone fruits, pome fruits, nut crops, vegetables, and
## commercially grown ornamentals; rabbit repellents (59.0% zeros, 6th)
## Fluometuron: cotton (83.1% zeros, 9th)
## Flufenacet: corn and soybeans (76.3% zeros, 10th)
## Bromacil: citrus and pineapple (97.7% zeros, 16th) -- Heavily used in Florida 🍊
## Pyrimethanil: almonds, pome fruit, citrus fruit, stone fruit, bananas,
## grapes, onions, pistachios, strawberries, tomatoes,
## tuberous vegetables (82.0% zeros, 19th)
## Indoxacarb: insecticide for apples, pears, Brassica [cabbages], sweet corn,
## lettuce, and fruiting vegetables (76.0% zeros, 20th)
The covariate calculation codes are placed in PrestoGP_Pesticides repository for now. I will work on migrating the codes into directories following our provisional directory structure mentioned in #14
A couple of problems were identified:
(left: counties with partial availability; upper right: 2000 counties highlighted with counties of partial availability; lower right: 2015 counties highlighted with counties of partial availability)![pesticides_county_nonpresence](https://github.com/Spatiotemporal-Exposures-and-Toxicology/PrestoGP_Pesticides/assets/25448786/8e6e1608-299c-49cd-98fa-e4743c1ccb14)