pipeline clarity and reproducibility

alecristia commented 4 years ago

[x] current version of README is missing information about files created at each step
[x] it would be better to separate data that comes from elsewhere from data created in this process. raw data are: "../zooniverse_classifications/zooniverse_data_all_final.csv" and "../metadata/metadata_all_PU.csv" -- I think, but then perhaps "../zooniverse_classifications/zooniverse_data_all_final.csv" is created in step 1?
[x] it's unclear that a python notebook is needed for step 1 (rather than a processing routine directly?)
[x] clean up the use of paths, it doesn't make sense to ask people who are simply reproducing this to go in and change things

NOTE: Chiara said this takes overnight to run. This seems very slow - look into that? Also, since it was so slow, I haven't tried steps 2-3 fully (I quit in the middle of step 2). Also, I'm not sure I succeeded in reproducing step 1.

alecristia commented 4 years ago

other small things:

the number of people who agreed in zooniverse is not logged in zooniverse_data_all_final.csv
segments that don't have majority agreement are not visible for later processing/stats
change Non-Canonical to Non-canonical so that there is overlap with lab labels?

alecristia commented 4 years ago

this paragraph in the paper cannot be reproduced:

An impressive total of 4,825 individual Zooniverse users provided labels for the Maturity of Baby Sounds project, of which the present data set is one part. For this project, we collected a total of 169,767 judgments provided for 33,731 500-ms chunks, corresponding to 11,980 LENA^TM^ segments. Nearly a fifth of chunks did not have at least 3 labels in agreement out of the 5 Zooniverse labels (N = 6,585, 19% of all chunks). Of the chunks without a majority agreement, 4341 (66%) contained one or two Junk judgements (out of 5), 6523 (99,9%) had at least two matching judgements (the threshold used for lab-annotated segments), and only 61 (0,01%) had 5 different judgements. Future work may explore different ways of setting the minimal requirement for convergence, but for further analyses here, we focused on the 81% of chunks that did have at least 3 labels in agreement; this represented 135,725 labels for 27,145 chunks, corresponding to 11,593 LENA^TM^ segments. As the segments average 1.12 seconds in length, this means about 3.8 hours of audio data were annotated by 8 different annotators (3 in the laboratory, 5 on Zooniverse).

LAAC-LSCP / zoo-babble-validation

pipeline clarity and reproducibility #1