andybi7676 / reborn-uasr

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
https://arxiv.org/abs/2402.03988
MIT License
6 stars 0 forks source link

REBORN training discussions #1

Open JeromeNi opened 1 month ago

JeromeNi commented 1 month ago

Hello,

First, thanks for sharing your recent work on the REBORN system! I know that this repo is still in its early stages, but we would like to start testing its performance against some of our tasks in unsupervised ASR.

After running train_rl_cnnagent.py, which seems to complete "3.1 Stage 1: Training the Segmentation Model" in your paper, how can we use the existing scripts in this repo to collect the outputs of the RL-tuned segmenter, post-process the boundaries and start "3.2 Stage 2: Training the Phoneme Prediction Model"?

It seems that boundary post-processing involves around first running rl/utils/generate_w2vu_segmental_results.py, followed by postprocess_boundaries.py and then running rl/utils/bds_to_ids.py to obtain a format that is similar to {set}.src that is originally prepared by the wav2vec-U scripts. Are these the correct scripts to use, and are there other steps necessary?

Thanks again!

Best, Junrui Ni Research Assistant, University of Illinois at Urbana-Champaign

andybi7676 commented 1 month ago

Hi, JeromeNi. I'm sorry for not getting back to you sooner. Thank you for understanding that our repository is still under development. Currently, we have released our phoneme predictor along with its tailored segmenter for reproducing the phoneme recognition results end-to-end. As for training, we are still organizing our code, and the training pipeline will be released and presented more comprehensively. In the meantime, considering the urgency of your need, we'd like to provide details directly to your questions. Appreciate your understanding!

It seems that boundary post-processing involves around first running rl/utils/generate_w2vu_segmental_results.py, followed by postprocess_boundaries.py and then running rl/utils/bds_to_ids.py to obtain a format that is similar to {set}.src that is originally prepared by the wav2vec-U scripts. Are these the correct scripts to use, and are there other steps necessary?

You are correct. To conduct the stage-2 training, you'll need to collect the raw boundary by running the rl/utils/generate_w2vu_segmental_results.py, merge the boundaries with consecutive phoneme predictions, and map the boundaries to ids for next-iter GAN training. You can find an example of achieving these in one of our testing shell scripts: rl/utils/ls_new_generate_and_eval.sh, where L45-L53 gathers the raw outputs; L54-L59 merges the boundaries; and L77 gives an example of processing the merged boundaries for next-iter GAN training.

Best regards, Liang-Hsuan Tseng, SPML lab, National Taiwan University.