LAAC-LSCP / Zooniverse2

Extract chunks from longform recordings based on automatic annotations, and upload them to Zooniverse for crowd-sourced classification.
0 stars 0 forks source link

Zooniverse2

Split audios in chunks and uploads them to zooniverse. Audio chunks and metadata are saved at a location specified by the user, allowing their later association with the results derived from Zooniverse.

Output metadata is stored as a dataframe, with the same format as this example file.

Installation

git clone https://github.com/LAAC-LSCP/Zooniverse2.git
cd Zooniverse2
pip install -r requirements.txt

Usage

Chunk extraction

python zooniverse.py extract-chunks [-h] --destination DESTINATION
                                    --sample-size SAMPLE_SIZE
                                    [--annotation-set ANNOTATION_SET]
                                    [--target-speaker-type {CHI,OCH,FEM,MAL}]
                                    [--batch-size BATCH_SIZE]
                                    [--threads THREADS]
                                    path

If it does not exist, DESTINATION is created. Audio chunks are saved in wav and mp3 in DESTINATION/chunks. Metadata is stored in a file named DESTINATION/chunks.csv.

argument description default value
path path to the dataset
destination where to write the output metadata and files. metadata will be saved to $destination/chunks.csv and audio chunks to $destination/chunks.
sample-size how many vocalization events per recording
batch-size how many chunks per batch 1000
annotation-set which annotation set to use for sampling vtc
target-speaker-type speaker type to get chunks from CHI
threads how many threads to perform the conversion on, uses all CPUs if <= 0 0

Chunk upload

python zooniverse.py upload-chunks [-h] --destination DESTINATION
                                   --zooniverse-login ZOONIVERSE_LOGIN
                                   --zooniverse-pwd ZOONIVERSE_PWD
                                   --project-slug PROJECT_SLUG --subject-set
                                   SUBJECT_SET [--batches BATCHES]

Uploads as many batches of audio chunks as specified to Zooniverse, and updates chunks.csv accordingly.

argument description default value
destination where to find the output metadata and files.
project-slug Zooniverse project slug (e.g.: lucasgautheron/my-new-project)
subject-set prefix for the subject set
zooniverse-login zooniverse login
zooniverse-pwd zooniverse password
batches how many batches to upload. it is recommended to upload less than 10.000 chunks per day, so 10 batches of 1000 by default. upload all batches if set to 0 0