forc-db / IPCC-EFDB-integration

Creative Commons Attribution 4.0 International
4 stars 3 forks source link

generate list of studies for review #16

Closed teixeirak closed 2 years ago

teixeirak commented 3 years ago

@ValentineHerr ,

Let's create a systematic method for identifying studies (citationID) to be reviewed for contribution to IPCC.

It would be helpful to generate a spreadsheet with the following fields : citationID, n potential records, variables represented, sites represented, review priority score, ready to rerun and send (to be filled manually). We'll probably want to tweak this list.

Requirements (all sites listed in the spreadsheet should meet this)

Prioritize based on: (assign points, put in review priority score)

This is just a rough start. Please modify as you see fit, and leave flexibility to change as we go.

ValentineHerr commented 3 years ago

OK, what do you think is best, looking at MEASUREMENTS of ForC_simplified (and its column suspected.duplicate)?

If using MEASUREMENTS table and there is potential duplicates, if they don't get resolved during review, the corresponding records won't make it through. (same with ForC_simplified and suspected.duplicate).

ValentineHerr commented 3 years ago

(I'll do MEASUREMENTS for now)

teixeirak commented 3 years ago

Should be fine, although we may need to add a simple mechanism to bypass the duplicate flagging for studies that we've carefully reviewed.

ValentineHerr commented 3 years ago

Ok I pushed this file. Let me know if that works and if it is a good place for it.

I only penalized potential site duplicate, not potential measurement duplicate.... But I manually checked the top 10 citations and non have conflicts except Powers_2012_csaa that has 2 records that are replicates and Wang_2013_eofa for which all records are replicates. But I think replicates are not as bad as duplicates and are "resolved" by taking average... Which, now that I am saying that, is probably not good for IPCC's review process...

teixeirak commented 3 years ago

Thanks! That's a good start.

A score of -3549... YIKES! (That one may be fully replicated between original ForC and SRDB.)

One small modification: could you please list the variables, rather than just giving the count? Some will get higher priority than others.

ValentineHerr commented 3 years ago

done (ignoring if in C or OM units).

teixeirak commented 3 years ago

Thanks! Let's leave it at this for now, but I'll keep this issue open for now because there's a good chance we'll want to adjust later.

teixeirak commented 2 years ago

We want to refine this system. Here are my notes on what I'd like to weight towards:

ForestGEO Studies included in GROA By region: tropics By variable: major stocks and increments

I need to come back and provide details.

teixeirak commented 2 years ago

@ValentineHerr , Here's the list above, edited (new criteria in bold):

Requirements (all sites listed in the spreadsheet should meet this)

Prioritize based on: (assign points, put in review priority score)

image
teixeirak commented 2 years ago

@ValentineHerr , just a reminder to update prioritization based on this when you have a chance. Not urgent.

ValentineHerr commented 2 years ago

@teixeirak, FYI, I don't see any delta.biomass_root delta.deadwood delta.O.horizon

in our data

teixeirak commented 2 years ago

correct, but we want them prioritized when available.