Closed elisonlau closed 10 months ago
<unk>
s in TARGET indicating that the target phonemes you provided are not always covered by the ones in the dictionary you used to train the model, i.e. the dict.phn.txt
in your model or data dir. You have to make sure the target phonemes are properly mapped to the ones you used to train your model.
<unk>
s in TARGET indicating that the target phonemes you provided are not always covered by the ones in the dictionary you used to train the model, i.e. thedict.phn.txt
in your model or data dir. You have to make sure the target phonemes are properly mapped to the ones you used to train your model.
@xuqiantong thanks so much, I got so stupid mistake.. with your point i corrected it and succeeded. Further more, I found the "examples/speech_recognition/infer.py" is so complex, through search i found with transformers architecture may simple this procedure, but so difficult transfer fairseq to transformers. Do you have some suggestion about that?! If "MUST" to use transformers, I found missing the config.json---how can I get this file.
Thanks and Looking forward U reply
❓ Questions and Help
HI @xuqiantong, @alexeib, @michaelauli, I got inspiration of your recent paper Simple and Effective Zero-shot Cross-lingual Phoneme Recognition ,and "transferred" it on my data to recognize mandarin phoneme with tone. But the result is so bad, and the wer(actually per) is always over 60, and a lot of [unk]s present. The below is a piece of result.
No matter Wav2Vec 2.0 Large or wav2vec 2.0 (XLSR) models as my pre-train model, the result is similar.
Question is : there is something wrong with my procedure or this is so hard to get Chinese tone classification in fact whatever using any wav2vec2 pre-train model?
What have you tried?
I'll try to pretty detailed with my setup :
I created the .tsv,.phn, dict.phn.txt files using the steps @alexeib provides in issue https://github.com/pytorch/fairseq/issues/2922#issuecomment-731308318 using the wav2vec_manifest.py file. The below picture shows a piece of dict.phn.txt
I was able to finetune the model using this train command:
python3 $FAIRSEQ_PATH/fairseq_cli/hydra_train.py \ distributed_training.distributed_port=0 \ task.labels=phn\ task.data=$DATASET \ dataset.valid_subset=$valid_subset \ distributed_training.distributed_world_size=1 \ model.w2v_path=$model_path \ hydra.run.dir=/content/drive/MyDrive/outputs \ +restore_file=/content/drive/MyDrive/outputs/checkpoints/checkpoint_last.pt \ --config-dir $config_dir \ --config-name $config_name
Config file is based on base_10h.yaml,and do some simple modifications like: MAX-TOKENS: 1000000,MAX-UPDATES:50000. I have just only single one GPU of A100 with mem40G, that no distribute for training.
Here is my command for running evaluation:
python3 $FAIRSEQ_PATH/examples/speech_recognition/infer.py $DATASET --task audio_finetuning \ --nbest 1 --path /content/drive/MyDrive/outputs/checkpoints/checkpoint_best.pt --gen-subset dev_other --results-path $DATASET --w2l-decoder viterbi \ --criterion ctc --labels phn --max-tokens 1800000
The above is my general procedure, I need your help! Thanks and looking forward to your reply! Elison
pip
, source): pip install fairseq