archiki / ASR-Accent-Analysis

Analysis and investigating the confounding effect of accents in end-to-end Automatic Speech Recognition models.
MIT License
13 stars 5 forks source link

AccentProbe - CSV file #1

Open natalietoulova opened 3 years ago

natalietoulova commented 3 years ago

Hi, I am new to programming and I am a little bit confused about which CSV file is used in AccentProbe and also which hyperparameters are used for training, did I miss it somewhere?

archiki commented 3 years ago

Hey @natalietoulova! I don't have an example .csv file that was used for accent probes, but if you look at the AccentProbe/data_loader.py file the csv file needs to contain file_name, accent_label, duration. Here I think my folder organization is such that I have stored representations (after different layers of the network) for each audio file, and these representation files are stored in a folder indicating the type of representation (eg. probe_data/lstm_0/[file name].npy). Here, data_path = probe_data, rep_type = lstm_0 and the file name comes from the csv file. The other thing that you will need is a meta file, which I have used in several experiments that aligns phones in speech to time durations in the audio files. This basically corresponds to end_times used in data_loader.py file. If you are only running this experiment, the entire alignment is not needed, you can simply do a voice activity detection to mark the time the speech starts and the time it ends. The code basically needs end_times to process out the silence as it may unnecessarily increase the data-size loaded into the accent classifiers. All this being said, you can always make your own custom data_loader.py that works for your set-up.

Regarding the hyperparameters, I will have to refer you to the paper for details. We did not find accent classification trends to be very hyper-parameter sensitive, so we picked learning_rate = 1e-03 and batch_size = 16/32 depending on available GPU space. Hope that answers your questions.

Best, Archiki