YuanGongND / gopt

Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".
BSD 3-Clause "New" or "Revised" License
153 stars 28 forks source link

while running run.sh in gop_speechocean it get error in visualize_feat.py AttributeError: 'tuple' object has no attribute 'shape' #20

Open amandeepbaberwal opened 1 year ago

amandeepbaberwal commented 1 year ago

`(env) amandeep@vitubuntu:~/Desktop/kaldi-master/egs/gop_speechocean762/s5$ ./run.sh utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/train local/data_prep.sh: successfully prepared data in data/train utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/test local/data_prep.sh: successfully prepared data in data/test steps/make_mfcc.sh --nj 1 --mfcc-config conf/mfcc_hires.conf --cmd run.pl data/train steps/make_mfcc.sh: moving data/train/feats.scp to data/train/.backup utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/train steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. steps/make_mfcc.sh: Succeeded creating MFCC features for train steps/compute_cmvn_stats.sh data/train Succeeded creating CMVN stats for train fix_data_dir.sh: kept all 1 utterances. fix_data_dir.sh: old files are kept in data/train/.backup steps/make_mfcc.sh --nj 1 --mfcc-config conf/mfcc_hires.conf --cmd run.pl data/test steps/make_mfcc.sh: moving data/test/feats.scp to data/test/.backup utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea. Search for the word 'bold' in http://kaldi-asr.org/doc/data_prep.html for more information. utils/validate_data_dir.sh: Successfully validated data-directory data/test steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. steps/make_mfcc.sh: Succeeded creating MFCC features for test steps/compute_cmvn_stats.sh data/test Succeeded creating CMVN stats for test fix_data_dir.sh: kept all 1 utterances. fix_data_dir.sh: old files are kept in data/test/.backup steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 1 data/train ../../librispeech/s5/exp/nnet3_cleaned/extractor data/train/ivectors steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to data/train/ivectors using the extractor in ../../librispeech/s5/exp/nnet3_cleaned/extractor. steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 1 data/test ../../librispeech/s5/exp/nnet3_cleaned/extractor data/test/ivectors steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to data/test/ivectors using the extractor in ../../librispeech/s5/exp/nnet3_cleaned/extractor. steps/nnet3/compute_output.sh --cmd run.pl --nj 1 --online-ivector-dir data/train/ivectors data/train ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp exp/probs_train steps/nnet3/compute_output.sh: WARNING: no such file ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/final.raw. Trying ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/final.mdl instead. steps/nnet3/compute_output.sh --cmd run.pl --nj 1 --online-ivector-dir data/test/ivectors data/test ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp exp/probs_test steps/nnet3/compute_output.sh: WARNING: no such file ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/final.raw. Trying ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp/final.mdl instead. Preparing phone lists 2 silence phones saved to: data/local/dict_nosp/silence_phones.txt 1 optional silence saved to: data/local/dict_nosp/optional_silence.txt 39 non-silence phones saved to: data/local/dict_nosp/nonsilence_phones.txt 5 extra triphone clustering-related questions saved to: data/local/dict_nosp/extra_questions.txt Lexicon text file saved as: data/local/dict_nosp/lexicon.txt utils/prepare_lang.sh --phone-symbol-table ../../librispeech/s5/data/lang_test_tgsmall/phones.txt data/local/dict_nosp data/local/lang_tmp_nosp data/lang_nosp Checking data/local/dict_nosp/silence_phones.txt ... --> reading data/local/dict_nosp/silence_phones.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict_nosp/silence_phones.txt is OK

Checking data/local/dict_nosp/optional_silence.txt ... --> reading data/local/dict_nosp/optional_silence.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict_nosp/optional_silence.txt is OK

Checking data/local/dict_nosp/nonsilence_phones.txt ... --> reading data/local/dict_nosp/nonsilence_phones.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict_nosp/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt --> disjoint property is OK.

Checking data/local/dict_nosp/lexicon.txt --> reading data/local/dict_nosp/lexicon.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict_nosp/lexicon.txt is OK

Checking data/local/dict_nosp/lexiconp.txt --> reading data/local/dict_nosp/lexiconp.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict_nosp/lexiconp.txt is OK

Checking lexicon pair data/local/dict_nosp/lexicon.txt and data/local/dict_nosp/lexiconp.txt --> lexicon pair data/local/dict_nosp/lexicon.txt and data/local/dict_nosp/lexiconp.txt match

Checking data/local/dict_nosp/extra_questions.txt ... --> reading data/local/dict_nosp/extra_questions.txt --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/local/dict_nosp/extra_questions.txt is OK --> SUCCESS [validating dictionary directory data/local/dict_nosp]

fstaddselfloops data/lang_nosp/phones/wdisambig_phones.int data/lang_nosp/phones/wdisambig_words.int prepare_lang.sh: validating output directory utils/validate_lang.pl data/lang_nosp Checking existence of separator file separator file data/lang_nosp/subword_separator.txt is empty or does not exist, deal in word case. Checking data/lang_nosp/phones.txt ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/lang_nosp/phones.txt is OK

Checking words.txt: #0 ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> data/lang_nosp/words.txt is OK

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ... --> silence.txt and nonsilence.txt are disjoint --> silence.txt and disambig.txt are disjoint --> disambig.txt and nonsilence.txt are disjoint --> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ... --> found no unexplainable phones in phones.txt

Checking data/lang_nosp/phones/context_indep.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 10 entry/entries in data/lang_nosp/phones/context_indep.txt --> data/lang_nosp/phones/context_indep.int corresponds to data/lang_nosp/phones/context_indep.txt --> data/lang_nosp/phones/context_indep.csl corresponds to data/lang_nosp/phones/context_indep.txt --> data/lang_nosp/phones/context_indep.{txt, int, csl} are OK

Checking data/lang_nosp/phones/nonsilence.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 320 entry/entries in data/lang_nosp/phones/nonsilence.txt --> data/lang_nosp/phones/nonsilence.int corresponds to data/lang_nosp/phones/nonsilence.txt --> data/lang_nosp/phones/nonsilence.csl corresponds to data/lang_nosp/phones/nonsilence.txt --> data/lang_nosp/phones/nonsilence.{txt, int, csl} are OK

Checking data/lang_nosp/phones/silence.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 10 entry/entries in data/lang_nosp/phones/silence.txt --> data/lang_nosp/phones/silence.int corresponds to data/lang_nosp/phones/silence.txt --> data/lang_nosp/phones/silence.csl corresponds to data/lang_nosp/phones/silence.txt --> data/lang_nosp/phones/silence.{txt, int, csl} are OK

Checking data/lang_nosp/phones/optional_silence.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 1 entry/entries in data/lang_nosp/phones/optional_silence.txt --> data/lang_nosp/phones/optional_silence.int corresponds to data/lang_nosp/phones/optional_silence.txt --> data/lang_nosp/phones/optional_silence.csl corresponds to data/lang_nosp/phones/optional_silence.txt --> data/lang_nosp/phones/optional_silence.{txt, int, csl} are OK

Checking data/lang_nosp/phones/disambig.{txt, int, csl} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 6 entry/entries in data/lang_nosp/phones/disambig.txt --> data/lang_nosp/phones/disambig.int corresponds to data/lang_nosp/phones/disambig.txt --> data/lang_nosp/phones/disambig.csl corresponds to data/lang_nosp/phones/disambig.txt --> data/lang_nosp/phones/disambig.{txt, int, csl} are OK

Checking data/lang_nosp/phones/roots.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 41 entry/entries in data/lang_nosp/phones/roots.txt --> data/lang_nosp/phones/roots.int corresponds to data/lang_nosp/phones/roots.txt --> data/lang_nosp/phones/roots.{txt, int} are OK

Checking data/lang_nosp/phones/sets.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 41 entry/entries in data/lang_nosp/phones/sets.txt --> data/lang_nosp/phones/sets.int corresponds to data/lang_nosp/phones/sets.txt --> data/lang_nosp/phones/sets.{txt, int} are OK

Checking data/lang_nosp/phones/extra_questions.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 14 entry/entries in data/lang_nosp/phones/extra_questions.txt --> data/lang_nosp/phones/extra_questions.int corresponds to data/lang_nosp/phones/extra_questions.txt --> data/lang_nosp/phones/extra_questions.{txt, int} are OK

Checking data/lang_nosp/phones/word_boundary.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 330 entry/entries in data/lang_nosp/phones/word_boundary.txt --> data/lang_nosp/phones/word_boundary.int corresponds to data/lang_nosp/phones/word_boundary.txt --> data/lang_nosp/phones/word_boundary.{txt, int} are OK

Checking optional_silence.txt ... --> reading data/lang_nosp/phones/optional_silence.txt --> data/lang_nosp/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1 --> data/lang_nosp/phones/disambig.txt has "#0" and "#1" --> data/lang_nosp/phones/disambig.txt is OK

Checking topo ...

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ... --> data/lang_nosp/phones/word_boundary.txt doesn't include disambiguation symbols --> data/lang_nosp/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt --> data/lang_nosp/phones/word_boundary.txt is OK

Checking word-level disambiguation symbols... --> data/lang_nosp/phones/wdisambig.txt exists (newer prepare_lang.sh) Checking word_boundary.int and disambig.int --> generating a 20 word/subword sequence --> resulting phone sequence from L.fst corresponds to the word sequence --> L.fst is OK --> generating a 10 word/subword sequence --> resulting phone sequence from L_disambig.fst corresponds to the word sequence --> L_disambig.fst is OK

Checking data/lang_nosp/oov.{txt, int} ... --> text seems to be UTF-8 or ASCII, checking whitespaces --> text contains only allowed whitespaces --> 1 entry/entries in data/lang_nosp/oov.txt --> data/lang_nosp/oov.int corresponds to data/lang_nosp/oov.txt --> data/lang_nosp/oov.{txt, int} are OK

--> data/lang_nosp/L.fst is olabel sorted --> data/lang_nosp/L_disambig.fst is olabel sorted --> SUCCESS [validating lang directory data/lang_nosp] steps/align_mapped.sh --cmd run.pl --nj 1 --graphs exp/ali_train data/train exp/probs_train ../../librispeech/s5/data/lang_test_tgsmall ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp exp/ali_train steps/align_mapped.sh: aligning data in data/train using model from ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp, putting alignments in exp/ali_train steps/diagnostic/analyze_alignments.sh --cmd run.pl ../../librispeech/s5/data/lang_test_tgsmall exp/ali_train steps/diagnostic/analyze_alignments.sh: see stats in exp/ali_train/log/analyze_alignments.log steps/align_mapped.sh: done aligning data. steps/align_mapped.sh --cmd run.pl --nj 1 --graphs exp/ali_test data/test exp/probs_test ../../librispeech/s5/data/lang_test_tgsmall ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp exp/ali_test steps/align_mapped.sh: aligning data in data/test using model from ../../librispeech/s5/exp/chain_cleaned/tdnn_1d_sp, putting alignments in exp/ali_test steps/diagnostic/analyze_alignments.sh --cmd run.pl ../../librispeech/s5/data/lang_test_tgsmall exp/ali_test steps/diagnostic/analyze_alignments.sh: see stats in exp/ali_test/log/analyze_alignments.log steps/align_mapped.sh: done aligning data. local/visualize_feats.py --phone-symbol-table data/lang_nosp/phones-pure.txt exp/gop_train/feat.scp data/local/scores.json exp/gop_train/feats.png Traceback (most recent call last): File "local/visualize_feats.py", line 75, in main() File "local/visualize_feats.py", line 68, in main features = TSNE(n_components=2).fit_transform(features) File "/home/amandeep/Desktop/kaldi-master/egs/gop_speechocean762/s5/env/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py", line 1118, in fit_transform self._check_params_vs_input(X) File "/home/amandeep/Desktop/kaldi-master/egs/gop_speechocean762/s5/env/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py", line 828, in _check_params_vs_input if self.perplexity >= X.shape[0]: AttributeError: 'tuple' object has no attribute 'shape' `

YuanGongND commented 1 year ago

If it occurs in gop_speechocean762/s5$ ./run.sh, then it should be a question to Kaldi maintainer. Are you test with your own data?

amandeepbaberwal commented 1 year ago

No i am testing with speechocean, using 1 train and 1 test data

Sent from my iPhone

On 19-Mar-2023, at 1:04 AM, Yuan Gong @.***> wrote:



If it occurs in gop_speechocean762/s5$ ./run.sh, then it should be a question to Kaldi maintainer. Are you test with your own data?

— Reply to this email directly, view it on GitHubhttps://github.com/YuanGongND/gopt/issues/20#issuecomment-1474967306, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACGTJQGRBETAO7E6AJXHCYTW4YE4LANCNFSM6AAAAAAV7PJNIA. You are receiving this because you authored the thread.Message ID: @.***>

YuanGongND commented 1 year ago

See the first line of warning utils/validate_data_dir.sh: WARNING: you have only one speaker. This probably a bad idea., it might due to you only use part of the data. Using the entire dataset might be a better idea since it is not large.

But again, this should be a question to https://github.com/kaldi-asr/kaldi/issues?q=is%3Aissue+gop , they might updated their code and I am not part of the recipe developers.

infinite-darkness108 commented 5 months ago

solved! In local/visualize_feats.py, add features=np.array(features) You should now be able to see results in

1) exp/gop_train/feats.png (t-SNE plot for two phones; the higher quality ones (as scored by 5 human experts) being well separated and low scored ones being in the mix of two clusters.) 2) exp/gop_test/result_gop.txt (It is in the format

.0 2.0 1.0 .1 2.0 2.0 . . .n 1.0 0.0 . . . so on ) meaning, the utterance's zeroth phone was scored 2.0 by human expert (avg of 5) and the prediction by the GoP based approach is 1.0, and the same utterance's next canonical phone was given a score of 2.0 by both human experts and the model.