forc-db / GROA

This repository houses data and code for the Global Reforestation Opportunity Assessment (GROA) led by Susan Cook-Patton of the Nature Conservancy.
Creative Commons Attribution 4.0 International
31 stars 10 forks source link

non consistent history for same plot.id #18

Closed ValentineHerr closed 4 years ago

ValentineHerr commented 5 years ago

@teixeirak and @CookPatton,

Here are a few problems I found while trying to build the ForC HISTORY table for GROA data. Let me know if you can think of ways to fix or deal with that.

1- There are 5 plots that have more than one type of prior.duration (shown separated by a comma in value of the table bellow). NB: date and stand.age are the same for all records involved.

site.id plot.id measurement.id variable values
2391 3705 5672, 5673, 5674 prior.duration settled 1690, dairy/cattle 1850-1890,settled 1690, dairy/cattle 1850-1891,settled 1690, dairy/cattle 1850-1892
2391 3706 5677, 5678, 5679 prior.duration settled 1690, dairy/cattle 1850-1895,settled 1690, dairy/cattle 1850-1896,settled 1690, dairy/cattle 1850-1897
2391 3709 5688, 5689, 5690 prior.duration settled 1690, dairy/cattle 1850-1906,settled 1690, dairy/cattle 1850-1907,settled 1690, dairy/cattle 1850-1908
2391 3710 5693, 5694, 5695 prior.duration settled 1690, dairy/cattle 1850-1911,settled 1690, dairy/cattle 1850-1912,settled 1690, dairy/cattle 1850-1913
277 3269 4392, 4391, 4390, 13931 prior.duration moderate use,light use

2- There are 6 plots that sometimes have "F" in prior but sometimes nothing (shown separated by a comma in value of the table bellow). NB: date and stand.age are the same for all records involved.

site.id plot.id measurement.id variable values
5572 7144 12930, 12532, 12531, 12577, 14053 prior ,F
5572 7145 12931, 12535, 12534, 12579, 14054 prior ,F
5572 7146 12932, 12537, 12581, 14055 prior ,F
5573 7148 12933, 12541, 12540, 12583, 14056 prior ,F
5574 7147 12934, 12543, 12542, 12585, 14057 prior ,F
5574 7149 12935, 12587, 14058 prior ,F
  1. There are 33 plots that have different plot.area recorded for different measurements (shown separated by a comma in value of the table bellow). plot.area is calculated as n x plot.size so it might be either n or plot.size that is not consistent across measurements of one plot. For those, sometimes the stand age changes, so I guess the whole plot is not always measured one year to the next? I would need to investigate more to identify the different cases. but @teixeirak, regardless of the reason, we need to find a solution on how to treat those plots...
site.id plot.id measurement.id variable values
274 1188 4371, 4372, 4373, 4374, 4375, 4376, 4377, 4378, 4379, 4380 plot.area 0.00405,0.09
60 2884 3913, 3912, 3914 plot.area 0.090000004,0.09
60 2885 3916, 3915, 3917 plot.area 0.090000004,0.09
60 2886 3919, 3918, 3920 plot.area 0.090000004,0.09
60 2887 3922, 3921, 3923 plot.area 0.090000004,0.09
60 2889 3928, 3927, 3929 plot.area 0.090000004,0.09
60 2890 3931, 3930, 3932 plot.area 0.090000004,0.09
60 2891 3934, 3933, 3935 plot.area 0.090000004,0.09
5575 9397 12590, 12589, 12591 plot.area 0.35,3.5
5575 9398 12593, 12592, 12594 plot.area 0.35,3.5
5575 9399 12596, 12595, 12597 plot.area 0.35,3.5
5575 9400 12602, 12601, 12603 plot.area 0.35,3.5
5575 9401 12605, 12604, 12606 plot.area 0.35,3.5
5575 9402 12608, 12607, 12609 plot.area 0.35,3.5
5575 9403 12611, 12610, 12612 plot.area 0.35,3.5
5575 9404 12614, 12613, 12615 plot.area 0.35,3.5
5575 9405 12617, 12616, 12618 plot.area 0.35,3.5
2286 1772 1065, 1067 plot.area 0.09,0.0828
2287 1773 1068, 1069 plot.area 0.0375,0.15
2229 1915 1421, 1424, 1422, 1423 plot.area 0.1,0.3,0.4
3606 2816 3779, 3780, 3781, 3782, 3783, 3784, 3785, 3786, 3787, 3788, 3789, 3790 plot.area 0.0675,0.042375
5572 7144 12930, 12532, 12531, 12577, 14053 plot.area 0.25,0.27
5572 7145 12931, 12535, 12534, 12579, 14054 plot.area 0.25,0.27
5572 7146 12932, 12537, 12581, 14055 plot.area 0.0225,0.27
5573 7148 12933, 12541, 12540, 12583, 14056 plot.area 0.36,0.27
5574 7147 12934, 12543, 12542, 12585, 14057 plot.area 0.0625,0.27
319 1565 153, 155, 157, 160, 164, 168, 172, 161, 165, 169, 173, 13523, 13524, 13525, 152, 154, 156, 159, 163, 167, 171 plot.area 0.01,0.12
60 2888 3924, 3926 plot.area 0.090000004,0.09
60 2892 3936, 3938 plot.area 0.090000004,0.09
2046 2203 2283, 2284 plot.area 0.3,0.6
13968 2204 2286, 2287 plot.area 0.3,0.6
5575 9406 12619, 12621 plot.area 0.35,3.5
CookPatton commented 5 years ago

@ValentineHerr I am working through fixing errors today so I'll push the latest data when I'm done.

(1) For your first two sets of issues, you caught errors. I fixed those on my end. (2) For the third set of issues:

site.id 274 was measured repeatedly with different plot sizes in different years. Data are correct. site.id 60, should have plot.size = 0.09. I fixed the issue. site.id 5575, should have N = 1 for litter measures. I fixed the issue. site.id 2286/2287 were measured repeatedly with different plot sizes in different years. Data are correct. site.id 2229 should have N=1 for all measures (3/4 referred to sub plot measurements). I fixed the issue. site.id 3606 was measured repeatedly with different plot sizes in different years. Data are correct. site.id 5572/5573/5574 should have N=1 for all measures. I fixed the issue. site.id 319 was measured repeatedly with different plot sizes in different years. Data are correct. site.id 2046 was measured repeatedly with different plot sizes in different years. Data are correct. site.id 13968 was measured repeatedly with different plot sizes in different years. Data are correct.

CookPatton commented 5 years ago

@ValentineHerr I pushed new data...did you get the new sitesf and nonsoil csvs?

ValentineHerr commented 5 years ago

Yes both files are updated on my side. I'll try to run codes ASAP and let you know if things are more smooth now.

ValentineHerr commented 4 years ago

Coming back to this. @teixeirak, How do you want to deal with plots that were repeatedly measured with different plot.area? Currently, @CookPatton gives the same plot.id but I don't think that is in compliance with ForC.... Should I give them a different plot name ? for example: plot.name = plot.name + [plot.area]

Note: [ForC's plot.area = GROA's plot.size * n] so sometimes the difference in plot.area is due to differences in n rather than plot.size.

ValentineHerr commented 4 years ago

After in person discussion with @teixeirak we decided to: