Voice-Privacy-Challenge / Voice-Privacy-Challenge-2022

Baseline Recipe for VoicePrivacy Challenge 2022: anonymization systems and evaluation software
64 stars 15 forks source link

not combining utt2uniq as it does not exist #14

Open 980202006 opened 2 years ago

980202006 commented 2 years ago

Is there any problem with this image

Natalia-T commented 2 years ago

This is not a bug/warning, but only an info message. This is an expected behavior of the Kaldi script utils/combine_data.sh for the given data.

980202006 commented 2 years ago

Thank you! I got this error. Are there any files missing from the directory? Traceback (most recent call last): File "/home/xingyum/models/Voice-Privacy-Challenge-2022/venv/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/xingyum/models/Voice-Privacy-Challenge-2022/venv/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/xingyum/.vscode-server/extensions/ms-python.python-2022.4.1/pythonFiles/lib/python/debugpy/main.py", line 45, in cli.main() File "/home/xingyum/.vscode-server/extensions/ms-python.python-2022.4.1/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main run() File "/home/xingyum/.vscode-server/extensions/ms-python.python-2022.4.1/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file runpy.run_path(target_as_str, run_name=compat.force_str("main")) File "/home/xingyum/models/Voice-Privacy-Challenge-2022/venv/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/home/xingyum/models/Voice-Privacy-Challenge-2022/venv/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/xingyum/models/Voice-Privacy-Challenge-2022/venv/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/xingyum/models/Voice-Privacy-Challenge-2022/baseline/local/anon/gen_pseudo_xvecs.py", line 107, in if pool_spk2gender[pool_spk] == gender: KeyError: '392'

Natalia-T commented 2 years ago

Could you please attach the file (spk2gender) from the speaker pool that causes the problem in https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/master/baseline/local/anon/gen_pseudo_xvecs.py#L106

If you did not change the speaker pool in the original recipe, the path is baseline\data\libritts_train_other_500\spk2gender.

980202006 commented 2 years ago

1006 f 102 f 1049 m 104 f 1051 f 1065 m 107 m 1084 f 1085 m 1092 f 1094 m 1096 m 1097 m 1107 m 110 f 1110 m 111 f 1124 f 1132 m 1152 f 1154 f 1161 m 1166 f 1168 f 1171 f 1179 m 1184 m 1187 m 1200 m 1225 m 1230 m 1239 m 123 f 1250 f 1252 f 1258 m 1260 m 1261 m 1266 f 1274 m 127 m 1280 m 128 m 1291 f 1298 f 1331 m 133 m 1341 m 1342 f 1347 m 1353 m 1367 m 1370 f 1373 f 1374 m 1384 m 1403 m 1414 f 1421 f 1430 m 1444 m 1469 m 1474 f 147 m 1485 m 1492 m 1494 m 1495 m 1505 m 151 f 152 m 153 m 1544 f 1545 f 1559 m 1563 f 1564 f 1566 f 1569 f 1572 m 1579 f 1593 f 1595 m 1601 m 1614 m 1618 m 161 m 1621 m 1633 f 1636 f 1643 m 1646 m 1647 m 1648 f 1653 f 1664 f 1665 f 1674 f 1679 f 167 m 1680 f 1681 f 1685 m 168 m 1690 f 1691 f 1693 f 1695 f 1696 f 1699 m 1704 f 1708 m 1710 f 1714 f 1717 f 1721 f 1726 f 1733 f 1736 f 173 f 1746 m 1750 f 1756 f 1757 f 1760 f 1765 f 1767 f 1772 m 1773 f 177 f 1780 m 1784 f 1795 m 1804 f 1809 f 1813 f 1815 m 1819 f 1828 m 1844 m 1846 m 1863 f 1868 m 1870 m 1878 m 1901 f 1920 f 1924 m 1931 f 1938 m 1968 f 1977 f 1985 m 1989 m 199 f 2001 m 2003 m 2013 m 2021 m 2026 f 202 f 2042 f 2046 m 2050 f 2051 m 2062 f 2063 m 2067 m 2068 f 2089 f 2090 f 2096 m 20 f 2100 f 2104 m 2122 m 2133 m 2140 m 2143 f 2148 f 2152 f 215 f 2185 m 218 m 2195 m 2198 m 2208 m 2234 m 2237 f 2246 m 2262 m 2270 f 2273 m 2275 f 2276 f 2279 m 2284 m 2288 f 228 f 2292 m 2297 f 2301 f 2309 m 2312 f 2339 f 2341 m 2346 f 2351 f 2356 m 2361 f 2374 f 2380 m 238 f 2405 m 2407 m 2437 m 243 f 2445 f 2448 m 245 m 2485 f 2487 f 2488 f 2491 m 2496 m 2504 f 2522 f 2526 m 252 m 253 m 2541 m 2544 f 2545 m 2552 m 2553 f 255 m 2568 f 2574 m 2587 f 2588 m 25 m 2606 m 2607 f 2624 m 263 m 264 m 265 m 2660 m 2671 f 2676 f 2694 f 2712 f 2724 m 2730 f 2733 m 2735 f 273 m 2740 m 2748 f 2754 f 2762 f 277 f 2825 m 2834 f 283 m 2854 m 2895 f 2909 f 2919 f 2925 f 2930 m 2943 f 2946 m 294 f 2967 f 2975 f 2979 f 2985 m 2988 f 2990 m 2997 f 2998 m 29 m 3006 f 3020 m 3021 m 3033 f 3045 m 3053 f 3054 m 3060 m 3063 m 3079 f 3088 m 3090 f 3097 m 3098 m 3100 m 3109 m 310 f 3125 f 3132 m 3135 m 3137 m 3138 f 313 f 3142 m 3143 f 3144 m 3148 m 3172 f 3179 f 317 m 3192 m 3196 f 319 m 31 m 3227 f 3238 m 3244 f 3245 f 3257 m 3261 m 3268 m 3271 m 3272 m 3285 m 3288 m 3290 f 3314 f 3318 f 3319 f 331 m 3334 f 3346 f 3356 f 336 m 3373 m 3381 m 3394 f 3400 f 3409 f 3411 f 3417 m 3433 m 3465 f 3467 f 3470 m 3479 f 3488 m 348 m 3500 f 3503 m 3541 m 3547 m 3553 m 3554 f 3557 f 3559 f 3564 f 3567 m 3571 m 3587 m 3588 f 3592 m 3595 m 3598 f 3606 m 3618 m 3641 f 3647 f 3650 m 3656 m 3657 m 365 m 3665 m 366 f 3675 f 3679 f 3681 f 3691 m 3698 f 36 m 3744 m 3747 m 3757 f 3779 f 377 m 3780 m 3783 m 3793 m 3796 m 3798 f 37 m 3819 m 3843 f 3845 m 3848 f 3867 f 3871 m 3885 f 3894 m 3895 m 3896 m 3906 f 3909 f 3911 f 3912 m 3925 f 3926 f 3928 f

Natalia-T commented 2 years ago

Thank you @980202006. This file spk2gender is incomplete, the total number of speakers in LibriTTS\train-other-500 is 1160, while your file contains only 411 speakers.

The bug happens because speaker 392 is missing in the file. (The original \baseline\data\libritts_train_other_500\spk2gender file is in the attachment: spk2gender)

980202006 commented 2 years ago

I have the following problem, activate does not accept more than one argument. Stage 8: Making evaluation subsets... temp dev utils/subset_data_dir.sh: reducing #utt from 2321 to 343 utils/subset_data_dir.sh: reducing #utt from 2321 to 1018 utils/subset_data_dir.sh: reducing #utt from 2321 to 960 utils/combine_data.sh data/libri_dev_trials_all data/libri_dev_trials_f data/libri_dev_trials_m activate does not accept more than one argument: ['data/libri_dev_trials_all', 'data/libri_dev_trials_f', 'data/libri_dev_trials_m']

the in_dir is data/libri_dev_trials_f the in_dir is data/libri_dev_trials_m utils/combine_data.sh [info]: not combining utt2uniq as it does not exist utils/combine_data.sh [info]: not combining segments as it does not exist utils/combine_data.sh: combined utt2spk utils/combine_data.sh [info]: not combining utt2lang as it does not exist utils/combine_data.sh: combined utt2dur utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist utils/combine_data.sh [info]: not combining reco2dur as it does not exist utils/combine_data.sh [info]: not combining feats.scp as it does not exist utils/combine_data.sh: combined text utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist utils/combine_data.sh [info]: not combining vad.scp as it does not exist utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist utils/combine_data.sh: combined wav.scp utils/combine_data.sh: combined spk2gender fix_data_dir.sh: kept all 1978 utterances. fix_data_dir.sh: old files are kept in data/libri_dev_trials_all/.backup utils/subset_data_dir.sh: reducing #utt from 12253 to 600 utils/subset_data_dir.sh: reducing #utt from 12253 to 5422 utils/subset_data_dir.sh: reducing #utt from 12253 to 344 utils/combine_data.sh data/vctk_dev_trials_f_all data/vctk_dev_trials_f data/vctk_dev_trials_f_common activate does not accept more than one argument: ['data/vctk_dev_trials_f_all', 'data/vctk_dev_trials_f', 'data/vctk_dev_trials_f_common']

the in_dir is data/vctk_dev_trials_f the in_dir is data/vctk_dev_trials_f_common utils/combine_data.sh [info]: not combining utt2uniq as it does not exist utils/combine_data.sh [info]: not combining segments as it does not exist utils/combine_data.sh: combined utt2spk utils/combine_data.sh [info]: not combining utt2lang as it does not exist utils/combine_data.sh: combined utt2dur utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist utils/combine_data.sh [info]: not combining reco2dur as it does not exist utils/combine_data.sh [info]: not combining feats.scp as it does not exist utils/combine_data.sh: combined text utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist utils/combine_data.sh [info]: not combining vad.scp as it does not exist utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist utils/combine_data.sh: combined wav.scp utils/combine_data.sh: combined spk2gender utils/fix_data_dir.sh: file data/vctk_dev_trials_f_all/spk2gender is not in sorted order or not unique, sorting it fix_data_dir.sh: kept all 5766 utterances. fix_data_dir.sh: old files are kept in data/vctk_dev_trials_f_all/.backup utils/subset_data_dir.sh: reducing #utt from 12253 to 5255 utils/subset_data_dir.sh: reducing #utt from 12253 to 351 utils/combine_data.sh data/vctk_dev_trials_m_all data/vctk_dev_trials_m data/vctk_dev_trials_m_common activate does not accept more than one argument: ['data/vctk_dev_trials_m_all', 'data/vctk_dev_trials_m', 'data/vctk_dev_trials_m_common']

the in_dir is data/vctk_dev_trials_m the in_dir is data/vctk_dev_trials_m_common utils/combine_data.sh [info]: not combining utt2uniq as it does not exist utils/combine_data.sh [info]: not combining segments as it does not exist utils/combine_data.sh: combined utt2spk utils/combine_data.sh [info]: not combining utt2lang as it does not exist utils/combine_data.sh: combined utt2dur utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist utils/combine_data.sh [info]: not combining reco2dur as it does not exist utils/combine_data.sh [info]: not combining feats.scp as it does not exist utils/combine_data.sh: combined text utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist utils/combine_data.sh [info]: not combining vad.scp as it does not exist utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist utils/combine_data.sh: combined wav.scp utils/combine_data.sh: combined spk2gender utils/fix_data_dir.sh: file data/vctk_dev_trials_m_all/spk2gender is not in sorted order or not unique, sorting it fix_data_dir.sh: kept all 5606 utterances. fix_data_dir.sh: old files are kept in data/vctk_dev_trials_m_all/.backup utils/combine_data.sh data/vctk_dev_trials_all data/vctk_dev_trials_f_all data/vctk_dev_trials_m_all activate does not accept more than one argument: ['data/vctk_dev_trials_all', 'data/vctk_dev_trials_f_all', 'data/vctk_dev_trials_m_all']

the in_dir is data/vctk_dev_trials_f_all the in_dir is data/vctk_dev_trials_m_all utils/combine_data.sh [info]: not combining utt2uniq as it does not exist utils/combine_data.sh [info]: not combining segments as it does not exist utils/combine_data.sh: combined utt2spk utils/combine_data.sh [info]: not combining utt2lang as it does not exist utils/combine_data.sh: combined utt2dur utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist utils/combine_data.sh [info]: not combining reco2dur as it does not exist utils/combine_data.sh [info]: not combining feats.scp as it does not exist utils/combine_data.sh: combined text utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist utils/combine_data.sh [info]: not combining vad.scp as it does not exist utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist utils/combine_data.sh: combined wav.scp utils/combine_data.sh: combined spk2gender fix_data_dir.sh: kept all 11372 utterances. fix_data_dir.sh: old files are kept in data/vctk_dev_trials_all/.backup test utils/subset_data_dir.sh: reducing #utt from 2620 to 438 utils/subset_data_dir.sh: reducing #utt from 2620 to 734 utils/subset_data_dir.sh: reducing #utt from 2620 to 762 utils/combine_data.sh data/libri_test_trials_all data/libri_test_trials_f data/libri_test_trials_m activate does not accept more than one argument: ['data/libri_test_trials_all', 'data/libri_test_trials_f', 'data/libri_test_trials_m']

the in_dir is data/libri_test_trials_f the in_dir is data/libri_test_trials_m utils/combine_data.sh [info]: not combining utt2uniq as it does not exist utils/combine_data.sh [info]: not combining segments as it does not exist utils/combine_data.sh: combined utt2spk utils/combine_data.sh [info]: not combining utt2lang as it does not exist utils/combine_data.sh: combined utt2dur utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist utils/combine_data.sh [info]: not combining reco2dur as it does not exist utils/combine_data.sh [info]: not combining feats.scp as it does not exist utils/combine_data.sh: combined text utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist utils/combine_data.sh [info]: not combining vad.scp as it does not exist utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist utils/combine_data.sh: combined wav.scp utils/combine_data.sh: combined spk2gender fix_data_dir.sh: kept all 1496 utterances. fix_data_dir.sh: old files are kept in data/libri_test_trials_all/.backup utils/subset_data_dir.sh: reducing #utt from 12350 to 600 utils/subset_data_dir.sh: reducing #utt from 12350 to 5328 utils/subset_data_dir.sh: reducing #utt from 12350 to 346 utils/combine_data.sh data/vctk_test_trials_f_all data/vctk_test_trials_f data/vctk_test_trials_f_common activate does not accept more than one argument: ['data/vctk_test_trials_f_all', 'data/vctk_test_trials_f', 'data/vctk_test_trials_f_common']

the in_dir is data/vctk_test_trials_f the in_dir is data/vctk_test_trials_f_common utils/combine_data.sh [info]: not combining utt2uniq as it does not exist utils/combine_data.sh [info]: not combining segments as it does not exist utils/combine_data.sh: combined utt2spk utils/combine_data.sh [info]: not combining utt2lang as it does not exist utils/combine_data.sh: combined utt2dur utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist utils/combine_data.sh [info]: not combining reco2dur as it does not exist utils/combine_data.sh [info]: not combining feats.scp as it does not exist utils/combine_data.sh: combined text utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist utils/combine_data.sh [info]: not combining vad.scp as it does not exist utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist utils/combine_data.sh: combined wav.scp utils/combine_data.sh: combined spk2gender utils/fix_data_dir.sh: file data/vctk_test_trials_f_all/spk2gender is not in sorted order or not unique, sorting it fix_data_dir.sh: kept all 5674 utterances. fix_data_dir.sh: old files are kept in data/vctk_test_trials_f_all/.backup utils/subset_data_dir.sh: reducing #utt from 12350 to 5420 utils/subset_data_dir.sh: reducing #utt from 12350 to 354 utils/combine_data.sh data/vctk_test_trials_m_all data/vctk_test_trials_m data/vctk_test_trials_m_common activate does not accept more than one argument: ['data/vctk_test_trials_m_all', 'data/vctk_test_trials_m', 'data/vctk_test_trials_m_common']

the in_dir is data/vctk_test_trials_m the in_dir is data/vctk_test_trials_m_common utils/combine_data.sh [info]: not combining utt2uniq as it does not exist utils/combine_data.sh [info]: not combining segments as it does not exist utils/combine_data.sh: combined utt2spk utils/combine_data.sh [info]: not combining utt2lang as it does not exist utils/combine_data.sh: combined utt2dur utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist utils/combine_data.sh [info]: not combining reco2dur as it does not exist utils/combine_data.sh [info]: not combining feats.scp as it does not exist utils/combine_data.sh: combined text utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist utils/combine_data.sh [info]: not combining vad.scp as it does not exist utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist utils/combine_data.sh: combined wav.scp utils/combine_data.sh: combined spk2gender utils/fix_data_dir.sh: file data/vctk_test_trials_m_all/spk2gender is not in sorted order or not unique, sorting it fix_data_dir.sh: kept all 5774 utterances. fix_data_dir.sh: old files are kept in data/vctk_test_trials_m_all/.backup utils/combine_data.sh data/vctk_test_trials_all data/vctk_test_trials_f_all data/vctk_test_trials_m_all activate does not accept more than one argument: ['data/vctk_test_trials_all', 'data/vctk_test_trials_f_all', 'data/vctk_test_trials_m_all']

the in_dir is data/vctk_test_trials_f_all the in_dir is data/vctk_test_trials_m_all utils/combine_data.sh [info]: not combining utt2uniq as it does not exist utils/combine_data.sh [info]: not combining segments as it does not exist utils/combine_data.sh: combined utt2spk utils/combine_data.sh [info]: not combining utt2lang as it does not exist utils/combine_data.sh: combined utt2dur utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist utils/combine_data.sh [info]: not combining reco2dur as it does not exist utils/combine_data.sh [info]: not combining feats.scp as it does not exist utils/combine_data.sh: combined text utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist utils/combine_data.sh [info]: not combining vad.scp as it does not exist utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist utils/combine_data.sh: combined wav.scp utils/combine_data.sh: combined spk2gender fix_data_dir.sh: kept all 11448 utterances. fix_data_dir.sh: old files are kept in data/vctk_test_trials_all/.backup Done

Stage 9: Anonymizing evaluation datasets... anon_level = spk libri_dev_enrolls /home/xingyum/models/Voice-Privacy-Challenge-2022/baseline/exp/am_nsf_data

Anonymizing using x-vectors and neural wavform models... activate does not accept more than one argument: ['--nj', '128', '--anoni-pool', 'libritts_train_other_500', '--data-netcdf', '/home/xingyum/models/Voice-Privacy-Challenge-2022/baseline/exp/am_nsf_data', '--ppg-model', 'exp/models/1_asr_am/exp', '--ppg-dir', 'exp/models/1_asr_am/exp/nnet3_cleaned', '--xvec-nnet-dir', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a', '--anon-xvec-out-dir', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon', '--plda-dir', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a', '--pseudo-xvec-rand-level', 'spk', '--distance', 'plda', '--proximity', 'farthest', '--cross-gender', 'false', '--rand-seed', '0', '--anon-data-suffix', '_anon', '--model-type', 'am_nsf_pytorch', '--inference-trunc-len', '-1', '--inference-batch-size-am', '10', '--inference-batch-size-wav', '5', 'libri_dev_enrolls']

param=libri_dev_enrolls

Stage a.0: Extracting xvectors for libri_dev_enrolls. activate does not accept more than one argument: ['--nj', '29', 'data/libri_dev_enrolls', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon']

980202006 commented 2 years ago

Thank you for reply.

Natalia-T commented 2 years ago

@minhduc0711 proposed a solution to fix this issue: https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/commit/5a8c9f90af2fda729b573ceb1c1f690ed0ea1c1e

So, you can similarly modify the first line in your env.sh file.

980202006 commented 2 years ago

Thank you! I have a new problem. image image

Natalia-T commented 2 years ago

The missing files in .../baseline/exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_trials_f/pseudo_xvecs/

-   spk2gender
-   pseudo_xvector.scp
-   pseudo_xvector.ark

should be created here:

https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5a8c9f90af2fda729b573ceb1c1f690ed0ea1c1e/baseline/local/anon/gen_pseudo_xvecs.py#L140

Did you successfully complete all the previous stages? (I think your code is a little different from the current git version).

980202006 commented 2 years ago

很抱歉延迟回复。我运行local/main_anonymization_train_data.sh遇到同样的问题,gen_pseudo_xvecs.py是否能提供一个可运行的参数。 image