Closed lucasgautheron closed 3 years ago
Project | Dataset | Sampling algorithm | Classification task(s) | Launch Date | URL |
---|---|---|---|---|---|
Marvin's pilot | LENA/BabyLogger data | i) for each recorded child, extract 10 sections of 30 seconds that are NOT silence (according to simple loudness detector) ii) cut these sections up into 500ms clips. NOTE : Loudness detector score should be averaged across LENA and BabyLogger | child voc, female adult speech, male adult speech, junk | ASAP | https://www.zooniverse.org/lab/14957 (if you can not access it, click "my projects" on the top left menu of zooniverse |
ERC WP1e/g | "all" | extract 350 vocs from CHI & FEM (vtc) for each recorded child, cut into 500ms (note: it would be good if the "skip/exclude" procedure would be in place) | 1) CHI/FEM/Junk; 2) if CHI or FEM, crying, laughing, canonical, non-canonical | no rush | doesn't exist yet |
zoo-phon (gold) | 1 pilot recording | if there are enough, pull out 250 randomly & 250 from high child volubility 1-minute regions (the latter should be "consecutive vocs" ie take top minute and pull out all the vocs from there, then move to next minute, etc) - IS NOT chunkified! | NONE! will be annotated in the lab | late Feb? | (none) |
zoo-phon (pilot) | same pilot rec as above | same segments as above, but they are processed in different ways: 1. the usual 500ms chunks; 2. cut at provided list of boundaries (coded by human); 3. cut at provided list of boundaries (coded by machine) [NOTE: both lists are provided by collaborators] | will be set up by collaborator, involves transcribing using the International Phonetic Alphabet | probably in April | (tbd by collaborator) |
zoo-phon | 10 randomly selected children from each of 5 corpora tbd | depends on results of pilor, but will probably be based on user-provided list of segments & boundaries | same as above | probably in September | (tbd by collaborator) |
Updated with my info :)
Updated below
Project | Dataset | Sampling algorithm | Classification task(s) | Launch Date | URL |
---|---|---|---|---|---|
Marvin's pilot | LENA/BabyLogger data | i) for each recorded child, extract 10 sections of 30 seconds that are NOT silence (according to simple loudness detector) ii) cut these sections up into 500ms clips. NOTE : Loudness detector score should be averaged across LENA and BabyLogger | child voc, female adult speech, male adult speech, junk | ASAP | https://www.zooniverse.org/lab/14957 (if you can not access it, click "my projects" on the top left menu of zooniverse |
ERC WP1e/g | "all" | extract 350 vocs from CHI & FEM (vtc) for each recorded child, cut into 500ms (note: it would be good if the "skip/exclude" procedure would be in place) | 1) CHI/FEM/Junk; 2) if CHI or FEM, crying, laughing, canonical, non-canonical | no rush | doesn't exist yet |
zoo-phon (gold) | 1 pilot recording | if there are enough, pull out 250 randomly & 250 from high child volubility 1-minute regions (the latter should be "consecutive vocs" ie take top minute and pull out all the vocs from there, then move to next minute, etc) - IS NOT chunkified! | NONE! will be annotated in the lab | late Feb? | (none) |
zoo-phon (pilot) | same pilot rec as above | same segments as above, but they are processed in different ways: 1. the usual 500ms chunks; 2. cut at provided list of boundaries (coded by human); 3. cut at provided list of boundaries (coded by machine) [NOTE: both lists are provided by collaborators] | will be set up by collaborator, involves transcribing using the International Phonetic Alphabet | probably in April | (tbd by collaborator) |
zoo-phon | 10 randomly selected children from each of 5 corpora tbd | depends on results of pilor, but will probably be based on user-provided list of segments & boundaries | same as above | probably in September | (tbd by collaborator) |
This is the roadmap I suggest:
child-project zooniverse extract-chunks
. Thus, users can do then own magic for the sampling and use our tool for the extraction and upload of chunks anyway.child-project zooniverse extract-chunks
.child-project zooniverse extract-chunks
.Does it sound good to you ?
NB: I cannot run the scripts until the data is properly packaged. Which might be difficult without the cluster.
to clarify:
for now, we implement a minimum within the package, and leave scripts for sampling outside of the package. Then, as these scripts get reused (or not) we make decisions of which to work into the package. Did I get that right?
If so, that sounds like an ideal approach -- instead of making decisions about what is likely or not, and then signing up to update code for those decisions, we have a period of observation in terms of what are the most common decisions.
Yep, exactly!