aclew / EAF_builder_scripts

Repository for scripts that make eaf files for the Random Sampling and High-Volubility ACLEW datasets
3 stars 1 forks source link

EAF_BUILDER_SCRIPTS

This repo contains the script for creating periodicly or randomly selected ELAN templates.

-Utils file contains onset fix tools for random and periodic onset-offset method, as well as an eaf creator function and csv creation function.

-Create_all_type_eaf contains a generic eaf generator code for periodic and random methods.

PACKAGE REQUIREMENTS:

To use this code

Don't forget to install the required packages mentioned above!

$ git clone https://github.com/aclew/EAF_builder_scripts.git

A note on the templates

You currently have three choices:

1) ACLEW-basic-template_all-tiers -- as the name indicates 2) ACLEW-LAAC-native -- a variant on ACLEW, the only change being that VCM is also coded in non-CHI tiers 3) ACLEW-LAAC-non-native -- for use when your annotators don't speak the language in the files; it does not include the xds, lex, mwu tiers

For launching the code:

python3 main1.py 'path_to_your_wav_folder' 'path_to_new_eaf_store_file' 'periodic or random(choose one)' --t 'int' --n/--skip 'int' (skip if periodic, n if random) --c_on 'int' --c_off 'int' --temp 'basic or native or non-native' --its 'y/n' (if yes) --n_its  'int' --its_types 'list' --overlap 'y/n'

SOME EXAMPLES:

python3 main1.py ../wavs/ ../eafs/ periodic --t 2 --skip 59 --c_on 2 --c_off 3 --temp non-native --its n

output: a periodic extraction on the non-native template every 59 minutes of 2-minutes-long chunks. The context tiers begin 2 minutes before code tier onset and end 3 minutes after the code tier offset without its file processing.

python3 main1.py ../wavs/ ../eafs/ random --t 2 --n 20 --c_on 1 --c_off 0,5 --temp native --its n

output: a random extraction on the native template of 20 chunks of 2-minutes-long. The context tiers begin 1 minutes before code tier onset and end half a minute after the code tier offset without its file processing.

python3 main1.py ../wavs/ ../eafs/ periodic --t 1 --skip 30 --c_on 2 --c_off 2 --temp basic --its y --n_its 15 --its_types AWC CVC --overlap n

output: a periodic extraction on the basic template every 30 minutes of 1-minutes-long chunks.The context tiers begin 2 minutes before code tier onset and end 2 minutes after the code tier offset. AWC and CVC its information types are extracted with 15 chunks. Overlap between periodic time segments and its information(AWC, CVC) time segments isn't permitted.

python3 main1.py ../wavs/ ../eafs/ random --t 3 --n 15 --c_on 0 --c_off 1 --temp non-native --its y --n_its 10 --its_types CTC --overlap y

output: a random extraction on the non-native template of 15 chunks of 3-minutes-long.The context tiers begin at the same time with the code tier onset and end 1 minutes after the code tier offset. CTC its information type is extracted with 10 chunks. Overlap between periodic time segments and its information(CTC) time segments is permitted.