bpbond / srdb

Global soil respiration database
MIT License
52 stars 34 forks source link

Manipulation standardization #96

Closed jinshijian closed 4 years ago

jinshijian commented 4 years ago

Hello @bpbond, I did a bunch of work trying to standardize the manipulation. Basically I used the same terminology for the same kind of treatment (e.g., burned for fire, burning, burn, burnt). There are still some room for improvement, will take a look at it later. Thanks

bpbond commented 4 years ago

I can't easily see the diff here–too many changes. Can you provide a summary or mapping of the changes?

bpbond commented 4 years ago

This reduces the number of unique Manipulation strings from 689 to 293–a big improvement.

There are 11 Manipulation strings that get mapped to more than one new string; we should look carefully at these.

> filter(mapping, different_new > 1)
# A tibble: 11 x 3
   Manipulation           different_new new_strings                                      
   <fct>                          <int> <chr>                                            
 1 Extra litter                       2 Extra litter, Litter manipulation                
 2 Fertilized, irrigation             2 Fertilized, irrigation, Fertilized, irrigated    
 3 Harvest                            2 Litter manipulation, Harvest                     
 4 Herbivore exclusion                2 Herbivore exclusion, None                        
 5 Inter-canopy                       2 None, Burned                                     
 6 Mineral                            2 Contaminant, Fertilized                          
 7 None                               2 None, Weed control                               
 8 sewage sludges                     2 None, Fertilized                                 
 9 Stem wood harvest                  2 Litter manipulation, Harvest                     
10 Thinned, double litter             2 Litter manipulation, Thinned, litter manipulation
11 Under-canopy                       2 None, Burned                    
bpbond commented 4 years ago

So for example

> x %>% left_join(x_branch) %>% filter(Manipulation=="Extra litter")
Joining, by = c("Record_number", "Entry_date", "Study_number")
   Record_number Entry_date Study_number Manipulation Manipulation_level    new_manipulation     new_man_level
1           5962 2017-02-06         9563 Extra litter         All litter        Extra litter        All litter
2           5963 2017-02-06         9563 Extra litter         All litter        Extra litter        All litter
3           5964 2017-02-06         9563 Extra litter         All litter        Extra litter        All litter
4           5965 2017-02-06         9563 Extra litter  S. superba litter        Extra litter S. superba litter
5           5966 2017-02-06         9563 Extra litter  S. superba litter        Extra litter S. superba litter
6           5967 2017-02-06         9563 Extra litter  S. superba litter        Extra litter S. superba litter
7           5968 2017-02-06         9563 Extra litter  O. pinnata litter        Extra litter O. pinnata litter
8           5969 2017-02-06         9563 Extra litter  O. pinnata litter        Extra litter O. pinnata litter
9           5970 2017-02-06         9563 Extra litter  O. pinnata litter        Extra litter O. pinnata litter
10          5973 2017-02-08         8917 Extra litter                           Extra litter                  
11          5976 2017-02-08         8917 Extra litter                           Extra litter                  
12          5979 2017-02-08         8917 Extra litter                           Extra litter                  
13          8025 2020-01-14         8376 Extra litter                    Litter manipulation      Extra litter
14          8026 2020-01-14         8376 Extra litter                    Litter manipulation      Extra litter

More "Extra litter" gets mapped to "Extra litter", but a couple to "Litter manipulation".

bpbond commented 4 years ago

@jinshijian Here's a quick combined file that may make it easy to filter and see what's being mapped to what.

combined.xlsx

jinshijian commented 4 years ago

Cool! Great check. I went back and checked, with some changes, but for No 4: Herbivore exclusion, one is control (because level is none, so this should be change to None and Herbivore exclusion); No 5: Inter-canopy is not a manipulation, it has two-level (None and Burn) so the manipulation should be changed to None and Burned; No 7: for the record number 9851, the manipulation level is weed-free, so the manipulation should be changed to Weed control rather than None; No 8: record number 9223, sewage sludges manipulation level is none, it should be change to None (it is the control); No 11: Under-canopy is not a manipulation, and the level is None and Burn, so it should be changed to None and Burned.

I have updated the pull request, please check. Thanks!

bpbond commented 4 years ago

Looks a lot better–thanks for your work on this! Down to 290 Manipulation categories. As you note above:

> filter(mapping, different_new > 1)
# A tibble: 6 x 3
  Manipulation        different_new new_strings              
  <fct>                       <int> <chr>                    
1 Herbivore exclusion             2 Herbivore exclusion, None
2 Inter-canopy                    2 None, Burned             
3 Mineral                         2 Harvest, Fertilized      
4 None                            2 None, Weed control       
5 sewage sludges                  2 None, Fertilized         
6 Under-canopy                    2 None, Burned             

Is there anything else we want to do before merging this?

jinshijian commented 4 years ago

Hello Ben, I double-checked the Mineral one, in study 6534, they are testing harvest's effect, and in 10795, they are testing fertilization. So I think it is right the old "Mineral" goes to two different Manipulation. So there is nothing need to do. THanks