SnapshotSerengetiScienceTeam / DataManagement

Scripts and issues to manage the SnapshotSerengeti images and metadata.
GNU General Public License v2.0
0 stars 0 forks source link

Invalid gold standard data; not in consensus table #44

Open aliburchard opened 9 years ago

aliburchard commented 9 years ago

There are 43 subjects in the gold-standard dataset that have an invalid code != 0. I discovered this by querying subjects in the gold data that were not in consensus data. I think that the consensus data ( in the sql db) should include invalid codes 2 and 3, but maybe the consensus data published on dryad should not, as the times are definitely bad.

The 43 images are below:

ASG000byl0
ASG000bymn
ASG000bymt
ASG000bymu
ASG000bymw
ASG000byn0
ASG000byno
ASG000bynu
ASG000bynv
ASG000bynx
ASG000byol
ASG000byom
ASG000byop
ASG000byp9
ASG000bypi
ASG000byql
ASG000byqp
ASG000bys8
ASG000bysy
ASG000byu3
ASG000byuy
ASG000byvu
ASG000bywu
ASG000byx4
ASG000byy7
ASG000byz7
ASG000byz8
ASG000bz2u
ASG000bz2y
ASG000bz39
ASG000bz39
ASG000bz3f
ASG000bz4v
ASG000bz5x
ASG000bz6m
ASG000bz6u
ASG000bz76
ASG000cs4i
ASG000euz9
ASG000euz9
ASG000ev2e
ASG000ev3i
ASG000ev4u
ASG000bz7l
ASG000oaw1
ASG000r71o
ASG000yvy0
mkosmala commented 9 years ago

Ah right. Okay, not as bad as I was thinking. So as I understand it:

  1. The invalid images have valid classifications, but bad time stamps.
  2. When we created the formal dataset, we left out invalid images because of their bad time stamps.
  3. But time stamps didn't matter for comparing to gold standard, so we hadn't paid attention to that until now.

So... hmmm. I agree that probably best is to omit these 43 images from our published gold standard data. We'll have to rerun various stats and update the data descriptor text. Blarg. But good catch.