SnapshotSerengetiScienceTeam / DataManagement

Scripts and issues to manage the SnapshotSerengeti images and metadata.
GNU General Public License v2.0
0 stars 0 forks source link

Create expert classification table in database #8

Open mkosmala opened 9 years ago

mkosmala commented 9 years ago

Will need to add more species and species groups (and "impossible") to the species table to do this. I'm thinking the expert classifications table will have:

species
count (can be null)
the 5 behaviors (can be null)
babies (can be null)
expert_name
comments (can be null)

That way we can say who -- or what group -- classified each "expert" capture. So, e.g. we could say "gold standard" or "insectivore team" or actual names: "Ali Swanson", etc.

mkosmala commented 9 years ago

Progress. Tables are made. Now need to import data into them.

mkosmala commented 9 years ago

I've compiled insectivore expert classifications and the 4,149 captures from the validation gold-standard dataset. @aliburchard, do you have additional expert-verified captures from your lion-cheetah analysis or anything else?

mkosmala commented 9 years ago

Note, I'm adding these species options to the database:

bat
cattle
duiker
insectSpider
jackalBlackBacked
jackalGolden
jackalSideStriped
mongooseBanded
mongooseWhiteTailed
springhare
steenbok
vulture
impossible
aliburchard commented 9 years ago

@mkosmala starting to think about this as I begin work on the cons bio special call paper. Two things: 1) I will probably be creating additional expert classifications. 2) How do we want to deal with disagreement among experts in these gold-standard tables? I think that's a valuable measure. (I learned that NOAA has only like 60% agreement among experts in counts...)...

aliburchard commented 9 years ago

@mkosmala This is relevant to issues: #11 #15, #23 -- and I think is a slight tweak on what you proposed in the first comment.

We should have two tables. One is raw(ish) expert classifications, the other is final gold-standard data compiled from these raw(ish) expert classifications.

As you proposed, Expert classifications should include:

capture event
number species
species
count (can be null)
the 5 behaviors (can be null)
babies (can be null)
expert_name
comments (can be null)

I think the gold-standard dataset would be improved as the following format:

capture event
number expert identifiers
number species
species
min count
mean count
max count
behaviours (aggregated in some way)

I think this structure better lets us measure % agreement among experts and evaluate consensus counts as being within the range of expert counts...

I also think we will want to continue expanding the gold-standard dataset (by aggregating expert classifications) - because it clearly needs expanding for certain species (100% accuracy is not typical of cheetahs or dik diks), and validating consensus accuracy should be done against the gold-standard data and not just individual expert classifications.(does that make sense?)

mkosmala commented 9 years ago

Justification for expert spread answers for abundance measures. Says @aliburchard :

Right, It's mostly with counts that there is an issue -- for example, there is only ~74% agreement among experts on how many animals are in a given image. When experts disagreed, there was an average of a 2.5 bin spread in the estimates. But, comparing consensus answers to these expert answers, consensus answers were within the spread of expert answers 88% of the time.

I think, ultimately, coming up with a single gold-standard count for some images is a bit contrived -- I'm often like "mehhhh there's....8? 12? wildebeest int here...I don't really know and can't resolve it, so I'll say 10." As in I have a range of counts that I would consider accurate, and I'm not sure my single answer is any better than any other answer within that range.