Hi, I tried to reproduce the ASR_LibriSpeech example using bash finetune_wavlm_large_linear_vicuna_7b, but the accuracy failed to surpass 0.5.
Also, could you explain more about the LibriSpeech data processing? The transcripts in LibriSpeech are in all capital letters and lack commas and periods. How did you handle this in the example README?
Here’s an example of what I’m referring to:
{"key": "1001-134707-0000_ASR", "source": "/data/open_data/librispeech_audio/audio/librispeech_1001-134707-0000.wav", "target": "1 little recks the laborer. How near his work is holding him to God, The loving laborer through space and time, after all, not to create, only or found only."}
Hi, I tried to reproduce the ASR_LibriSpeech example using
bash finetune_wavlm_large_linear_vicuna_7b
, but the accuracy failed to surpass 0.5.Also, could you explain more about the LibriSpeech data processing? The transcripts in LibriSpeech are in all capital letters and lack commas and periods. How did you handle this in the example README?
Here’s an example of what I’m referring to:
{"key": "1001-134707-0000_ASR", "source": "/data/open_data/librispeech_audio/audio/librispeech_1001-134707-0000.wav", "target": "1 little recks the laborer. How near his work is holding him to God, The loving laborer through space and time, after all, not to create, only or found only."}
Is this preprocessing step important?
Thanks!