choijeongsoo / lip2speech-unit

[Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units
Other
21 stars 2 forks source link

Differences in extracted Speech Units #1

Closed DomhnallBoyle closed 7 months ago

DomhnallBoyle commented 7 months ago

Hello, thanks for sharing this great work.

I extracted the speech units on the 5 LRS3 samples located in the datasets folder. I used this command:

python quantize_with_kmeans.py \
--feature_type hubert \
--kmeans_model_path km.bin \
--acoustic_model_path hubert_base_ls960.pt \
--layer -1 \
--manifest_path datasets/lrs3/test_unit_manifest.txt \
--out_quantized_file_path=datasets/lrs3/label/test.unt \
--extension ".wav" \
--hide-fname

km.bin was downloaded from https://dl.fbaipublicfiles.com/textless_nlp/gslm/hubert/km200/km.bin hubert_base_ls960.pt was downloaded from https://dl.fbaipublicfiles.com/hubert/hubert_base_ls960.pt

My test_unit_manifest.txt looks like this:

[PATH TO REPO]/datasets/lrs3
audio/test/UmvOgW6iV2s/00007.wav    68608
audio/test/UmvOgW6iV2s/00001.wav    39936
audio/test/UmvOgW6iV2s/00002.wav    20480
audio/test/UmvOgW6iV2s/00004.wav    57344
audio/test/62cNtvx6P8E/00001.wav    24576

There are some differences in the output speech units. For example, the 3rd line in the original datasets/lrs3/label/test.unt is: 14 14 171 171 120 48 48 125 153 193 193 170 78 78 71 71 19 48 48 48 128 128 116 163 163 70 125 31 31 46 46 46 170 170 95 95 30 30 19 120 62 125 44 58 58 160 160 17 17 158 113 28 113 151 126 126 87 157 62 74 135 135 94

But the output of the command above on the same line is: 14 14 67 67 120 48 74 131 110 193 193 170 78 78 71 168 19 48 48 21 128 128 116 163 163 85 125 89 31 46 46 46 170 170 95 95 71 30 19 120 62 89 44 41 58 160 160 17 17 17 28 28 113 151 126 126 87 157 157 74 135 75 131

Am I doing something wrong? Am I using the correct layer?

Thanks

choijeongsoo commented 7 months ago

Hi,

Thank you for pointing out. We used 6th layer of HuBERT for extracting speech units. I'll update the documentation to reflect this detail.

DomhnallBoyle commented 7 months ago

Thanks very much for your reply