Closed teixeirak closed 5 years ago
@teixeirak are we specifically concerned about instances where the same variable is recorded twice, with the same stand age/date for both records? I've found instances where there are potential site duplicates, but they have different stand ages (so are potentially from different years or different plots within the same site?) or years recorded. In these cases, they probably aren't duplicate measurements, so the question is how we deal with measurements across different ages/years.
So far I've found very few duplicate sites coming up in my analysis - I think that this isn't too much of a concern for my analysis specifically.
@teixeirak I think the potential_duplicate_group field is done by a script; do you mean that I should flag sites in the potential_duplicate_manual field?
Great news that there seem to be few duplicates! We want to look at potential_duplicate_group
, and also confirmed.unique
. If confirmed.unique
=1 for both of the potential duplicates, we know they are independent sites. We are only concerned about records that would be duplicates if they were at the same site--i.e., same variable, same year. We will need to flag and resolve those duplicates.
We're also somewhat (but less) concerned about instances where we have the same variable measured in different years. In this case, geographic.area
but not plot
would be correctly represented by the random effects. This is less critical.
I don't really remember the logic behind creating the potential_duplicate_manual field, but we're not really using it.
@beckybanbury, let's resolve this before we get too deep into interpretation. I doubt results will change much, but we don't want to have to redo the work of reviewing/ interpreting results.
The plots that I identified as being potential duplicates are Pasoh, Teshio, Bonanza/BNZ, Wayquecha, and Nouragues (plots that have entries from SRDB and the original ForC database). I'm still unclear about how I need to deal with these sites though.
Pasoh is fixed.
Teshio is fixed.
Wayqecha was previously fixed. (None of these will be fully fixed until all the scripts are re-run to update PLOTS, ForC_simplified, etc.)
Nouragues is fixed.
Bonanza is a big job! I merged Bonanza/ BNZ sites 5A, 5C, 5D, and also discovered a number of incorrect values among these records.
As far as I can tell, Bonanza is now reconciled, which means that all of these sites should be fixed. We need to re-run the script to make plots and ForC_simplified.
I've now updated plots and ForC_simplified from the script
@teixeirak in ForC there are 5 Bonanza measurements that don't have a mean value - do you know why that's happened?
I've also noticed that when I run ForC_simplified it is adding a lot of NAs into the mean column; I'm not sure why this is (hoping it is related to the ForC measurements that don't have a value).
I'll check as soon as I get a chance.
That should be fixed. You will need to re-run the scripts, including plots (there was an extra one in there by mistake).
Also, if you didn't already do this, we need to re-run the script that identifies and deals with duplicates within Measurements.
Done.
@beckybanbury, as discussed in person...
The field
potential_duplicate_group
in the SITES table identifies potential site duplicates. Let's flag sites with potential duplicates that appear in your analysis. Specifically, we're concerned about instances where the same variable is recorded twice for separate sites that are flagged as potential duplicates.