Open mkosmala opened 9 years ago
Progress. Tables are made. Now need to import data into them.
I've compiled insectivore expert classifications and the 4,149 captures from the validation gold-standard dataset. @aliburchard, do you have additional expert-verified captures from your lion-cheetah analysis or anything else?
Note, I'm adding these species options to the database:
bat
cattle
duiker
insectSpider
jackalBlackBacked
jackalGolden
jackalSideStriped
mongooseBanded
mongooseWhiteTailed
springhare
steenbok
vulture
impossible
@mkosmala starting to think about this as I begin work on the cons bio special call paper. Two things: 1) I will probably be creating additional expert classifications. 2) How do we want to deal with disagreement among experts in these gold-standard tables? I think that's a valuable measure. (I learned that NOAA has only like 60% agreement among experts in counts...)...
@mkosmala This is relevant to issues: #11 #15, #23 -- and I think is a slight tweak on what you proposed in the first comment.
We should have two tables. One is raw(ish) expert classifications, the other is final gold-standard data compiled from these raw(ish) expert classifications.
As you proposed, Expert classifications should include:
capture event
number species
species
count (can be null)
the 5 behaviors (can be null)
babies (can be null)
expert_name
comments (can be null)
I think the gold-standard dataset would be improved as the following format:
capture event
number expert identifiers
number species
species
min count
mean count
max count
behaviours (aggregated in some way)
I think this structure better lets us measure % agreement among experts and evaluate consensus counts as being within the range of expert counts...
I also think we will want to continue expanding the gold-standard dataset (by aggregating expert classifications) - because it clearly needs expanding for certain species (100% accuracy is not typical of cheetahs or dik diks), and validating consensus accuracy should be done against the gold-standard data and not just individual expert classifications.(does that make sense?)
Justification for expert spread answers for abundance measures. Says @aliburchard :
Right, It's mostly with counts that there is an issue -- for example, there is only ~74% agreement among experts on how many animals are in a given image. When experts disagreed, there was an average of a 2.5 bin spread in the estimates. But, comparing consensus answers to these expert answers, consensus answers were within the spread of expert answers 88% of the time.
I think, ultimately, coming up with a single gold-standard count for some images is a bit contrived -- I'm often like "mehhhh there's....8? 12? wildebeest int here...I don't really know and can't resolve it, so I'll say 10." As in I have a range of counts that I would consider accurate, and I'm not sure my single answer is any better than any other answer within that range.
Will need to add more species and species groups (and "impossible") to the species table to do this. I'm thinking the expert classifications table will have:
That way we can say who -- or what group -- classified each "expert" capture. So, e.g. we could say "gold standard" or "insectivore team" or actual names: "Ali Swanson", etc.