forc-db / ForC

Global Forest Carbon Database
https://forc-db.github.io/
Creative Commons Attribution 4.0 International
55 stars 24 forks source link

independent records not making it through to ForC simplified because they are considered potential duplicates #248

Closed teixeirak closed 2 years ago

teixeirak commented 3 years ago

I'm opening an issue to document an off-line conversation.

We're seeing cases where independent records not making it through to ForC simplified because they are considered potential duplicates. Examples include McGarvey_2015_csio (70 of 96), Tepley_2017 (5 of 57), Fleming_1998 (24 of 115). I'm certain that there are many more.

@ValentineHerr has done some "dirty coding" to flag the Tepley and Fleming studies as independent, but we need:

(1) in the shorter term, a clean way to flag confirmed independent studies as such (@ValentineHerr , does flagging "confirmed.unique"= 1 in sites table do this?) (2) ultimately, this requires a solution to the duplicates issue

ValentineHerr commented 3 years ago

(1) in the shorter term, a clean way to flag confirmed independent studies as such (@ValentineHerr , does flagging "confirmed.unique"= 1 in sites table do this?)

If I overwrite For_simplified's suspected.duplicate by 0 for records coming from a site that has either a "0" in potential_duplicate_group or a "1" in confirmed.unique, none of the records from the 3 sitations you mentioned are flagged.

Do you validated this approach?

ValentineHerr commented 3 years ago

I assumed it is okay, at least temporarily so I pushed a new ForCSimplified. Now updating the files to review in IPCC-EFDB-integration.

teixeirak commented 3 years ago

Yes, that sounds reasonable. Note that a bunch of additional records came through. I assume these were mistakenly excluded previously. Will try to understand that now.

teixeirak commented 2 years ago

I think we can close this.