The hypotheses-to-transcription (H2T) training in NeurIPS 2023 and IEEE ASRU 2023
🤗 HP-v0 Dataset
git clone https://github.com/Hypotheses-Paradise
cd Hypo2Trans/H2T-LoRA
python finetune.py \
--base_model 'yahma/llama-7b-hf' \
--data_path './data/train_wsj.json' \
--output_dir './wsj' \
--lora_target_modules='[q_proj,k_proj, v_proj, o_proj]' \
--num_epochs=10 \
--cutoff_len=512 \
--group_by_length \
--learning_rate 2e-4 \
--micro_batch_size=64 \
--batch_size=256 \
--lora_r=16
python inference.py \
--ckpt_path './wsj'
--test_data_path './data/test_wsj.json'
The table below presents the WER(%) results of H2T-ft and H2T-LoRA in finetuning setting, where $o{nb}$ and $o{cp}$ respectively denote n-best oracle and compositional oracle: | Test Set | Baseline | LM $_{rank}$ | T5-ft | LLaMA-ft | T5-LoRA | LLaMA-LoRA | $o_{nb}$ | $o_{cp}$ |
---|---|---|---|---|---|---|---|---|---|
WSJ | 4.5 | 4.3-4.4% | 4.0-11.1% | 3.8-15.6% | 2.7-40.0% | 2.2-51.1% | 4.1 | 1.2 | |
ATIS | 8.3 | 6.9-16.9% | 2.7-67.5% | 3.4-59.0% | 1.7-79.5% | 1.9-77.1% | 5.2 | 1.1 | |
CHiME-4 | 11.1 | 11.0-0.9% | 7.9-28.8% | 8.2-26.1% | 7.0-36.9% | 6.6-40.5% | 9.1 | 2.8 | |
Tedlium-3 | 8.5 | 8.0-5.8% | 6.6-22.4% | 5.2-38.8% | 7.4-12.9% | 4.6-45.9% | 3.0 | 0.7 | |
CV-accent | 14.8 | 16.0+8.1% | 12.9-12.8% | 15.5+4.7% | 11.0-25.7% | 11.0-25.7% | 11.4 | 7.9 | |
SwitchBoard | 15.7 | 15.4-1.9% | 15.9+1.3% | 18.4+17.1% | 14.9-5.1% | 14.1-10.2% | 12.6 | 4.2 | |
LRS2 | 10.1 | 9.6-5.0% | 9.5-5.9% | 10.2+1.0% | 6.6-34.7% | 8.8-12.9% | 6.9 | 2.6 | |
CORAAL | 21.4 | 21.4-0.0% | 23.1+7.9% | 22.9+7.0% | 20.9-2.3% | 19.2-10.3% | 21.8 | 10.7 |
@inproceedings{yang2023generative,
title={Generative speech recognition error correction with large language models and task-activating prompting},
author={Yang, Chao-Han Huck and Gu, Yile and Liu, Yi-Chieh and Ghosh, Shalini and Bulyko, Ivan and Stolcke, Andreas},
booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
pages={1--8},
year={2023},
organization={IEEE}
}
@inproceedings{chen2023hyporadise,
title={HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models},
author={CHEN, CHEN and Hu, Yuchen and Yang, Chao-Han Huck and Siniscalchi, Sabato Marco and Chen, Pin-Yu and Chng, Ensiong},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023}
}