Closed ValentineHerr closed 6 years ago
Thanks, and ouch! This is way too many to go through manually. I'm going to start listing some rules to apply in order to cut this down:
@ValentineHerr, please implement the above. That should leave only a small (reasonable) number of records requiring manual review.
@teixeirak , to clarify, when you say "This is after resolving the issue above" or "After resolving all of the above", you mean in case the previous issues led to multiple "1" in the D.precedence OR none of the above issues applied. Right ?
Also, just to make sure, When you say "differ only in...", is it really exclusive ? For example, I have 2 duplicates from 2 different study, with different units. If I follow your "differ only in units" statement, I am not picking OM vs C but I move on to giving the later study the precedence.
1st question- yes. 2nd question- Actually, let's give the OM precedence in the example above.
@teixeirak, could you review the order below? The rule would be to keep going down the list if D.precedence is still NA or if multiple "1" were assigned.
Should remain: records that only differ in method.ID and/or notes. I'll double check if dup.num is given to those and if yes, I'll assign precedence. I don't want to do it first to make sure most of the records are treated the same way.
Let me know if you approve.
I have mixed feelings about giving this automated process precedence over the rankings that were determined manually, as that ranking would often incorporate specific knowledge about the records. Let's try this and compare before finalizing that decision.
@teixeirak, double checking a couple things:
For the C or OM unit rule, I understand:
same as above nut for duration of the record
min.dbh:
same as above for depth
I am thinking about coding on the notes field, looking for "only" and "+" or "all", and giving 0 or 1 for D.predence when there is a clear distinction about how inclusive the records are. If I manage to do that, when should this happen in the list? In other words, how important it is compared to min.dbh, depth etc... the higher in the list the more important.
@teixeirak, I just pushed the measurements with updated D.precedence. There is 100 records that need to be done manually (they have NAC in D.precedence column and "D.precedence given manually." in notes). FYI, 252 records were given D.Precedence based on dup.num. This is specified in the notes too.
Thank you. Could you please put the notes on this in conflict.notes instead of notes?
Oh yes sorry I forgot about that column. Ok done.
There are some records with NA in the conflicts field. Could you please fix?
Done. Sorry for that. I didn't see some records made it through the mesh! I double checked and they should all be Independent.
Fix Faber-Langendoen_1992_ecor sites:
It looks like you've accidently printed out several extra columns at the end of MEASUREMENTS.
shoot sorry for that... I fixed it
For 1042-1045 I think they are getting S because they have no dates at all and no stand.age. So technically we don't know if they are Replicates or not. It would take a bit more coding to handle this special case. I am happy to do it if you think it is necessary.
999 means stand age is intact/ undisturbed/ old growth, not unknown. Unknown stand ages get missing values codes. So please change the code so that it will treat ‘999’ as such. Let’s say ‘999’ conflicts with stand.age>100.
Alternatively, if coding this is complicated, its fine to fix by hand.
No it is okay, it should be fine. I forgot about this code. I think I was thinking of climate data where it interpreted as "missing".
Have you looked at everything ? did you edit the D.precedence ? Let me know when I can run the code again.
Please run it now. I'll edit D.precedence once those are done. I've scanned the other records and haven't noticed other problems, but it is possible I'll find more as I look them over carefully. This is tricky in that D.precedence can be edited by hand, but I don't want to just edit the other columns; those need to be fixed in the code (unless we give up on the idea of having the code get them all right).
@ValentineHerr, I've finished assigning D.precedence. I edited some records by hand (and added conflict.notes). There were a couple instances where I changed fields other than D.precedence. I also deleted some records.
Sorry I had to run an errand Friday afternoon and it took longer than expected... I understand that I don't need to re-run anything, right ?
I am working on resolving conflicts now. I found one records (ID 15293) that has "1" for D.precedence but capital S in conflict. It was given manually. Did you mean it?
That is a tricky one but I think there should only zeroes for the precedence in D.group 720 and the 4th record should have received a 1 for the precedence in D.group 721. So in the end we would only keep records 4 and 5.
Do you agree ?
ID | measurement.ID | sites.sitename | plot.name | stand.age | variable.name | date | start.date | end.date | conflicts | S.group | D.group | D.precedence | conflict.type | conflicts.notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 15290 | Tumbarumba flux station | mature managed forest | 90 | NEE_C | NA | 2002.084932 | 2002.832877 | D | NA | 720,721 | 0 | M | T | D.precedence given manually. |
2 | 15291 | Tumbarumba flux station | mature managed forest | 90 | NEE_C | NA | 2002 | 2003 | D,S | 176 | 720 | 0 | M | T | NA |
3 | 15293 | Tumbarumba flux station | mature managed forest | 90 | NEE_C | NA | 2001.084932 | 2004.163934 | D,S | 177 | 720 | 1 | M | T | D.precedence given manually. |
4 | 15294 | Tumbarumba flux station | mature managed forest | 90 | NEE_C | 2002 | NA | NA | D,s | 176,177 | 721 | 0 | M | T | NA |
5 | 15295 | Tumbarumba flux station | mature managed forest | 90 | NEE_C | 2003 | NA | NA | s | 176,177 | NA | NA | M | T | NA |
[ ] 10546 and 10552 (in ORNL-FACE, elevated CO2) have NA for conflicts. I am changing that to "independent".
[ ] 10487 (in ORNL-FACE, ambient ORNL-FACE) has "I" for conflicts but belongs to a D.group (387) with zero for D.precedence. @teixeirak, I think you manually edited this by hand. should I replace "I" by "D" in conflicts ?
Regarding Turbarumba, that is tricky. I agree with your assessment.
Regarding ORNL-FACE, yes, please fix as you suggest.
I fixed a few problems that I found. Not sure why my code didn't catch them but there were not many of them.
@teixeirak ,
I pushed the new system for ID-ing the duplicates. Now you can give the precedence to the records in the D.group.
Please, let me know if you find any problem.