LAAC-LSCP / zoo-babble-validation

Apache License 2.0
0 stars 0 forks source link

Describing vocalizations in young children: A big data approach through citizen science annotation

This repo contains the code needed to reproduce a conference paper (202007_SLT) and a journal paper (202010_jslhr). Given that the latter is a later-produced expansion of data and analyses, we only provide information for reproducing the latter.

To reproduce the manuscript, you can simply knit 202010_jslhr/paper.Rmd. NOTE! If you are not in the LAAC team, this is probably the only step you can reproduce. If you want to reproduce the other steps, write to

To rerun the whole pipeline:

  1. Put the following two files inside files_from_zooniverse/

These are very large, so they are not synced in the repo.

  1. Run data_analyses/code/generate_jslhr_data.R. This will just select down the subject and classification data to the filenames in the PU dataset. In addition, this step will generate key_info.csv, which is used in paper.Rmd, as well as the following files, which are used in preprocess.R:
  1. Run 202010_jslhr/postprocess.R, which has three steps: cleaning errors in the classifications, generating chunk- and segment-level data. NOTE!! that this file generates data_analyses/output/clean_classifications.csv, another large file that is not synced but is necessary for this process. It'll generate the two key files which are used in paper.Rmd:
  1. knit 202010_jslhr/paper.Rmd