Are a text transcript, defined region, and definededited_region all required for inference and training on automatic stutter removal? Is there any way to provide only the raw audio and destutter it? If so, would this be done by running spec_denoiser.py or another script?
Are a text transcript, defined
region
, and definededited_region
all required for inference and training on automatic stutter removal? Is there any way to provide only the raw audio and destutter it? If so, would this be done by runningspec_denoiser.py
or another script?