Dialogue Speech-to-Unit Encoder for dGSLM - checkpoint issues

qianlivia commented 1 year ago

🐛 Bug

I tried to load the checkpoints (Fisher HuBERT model and k-means model) from

Dialogue Speech-to-Unit Encoder for dGSLM: The Fisher HuBERT model

and got the following error:

No such file or directory: '/checkpoint/ntuanh/experiments/dialogue-LM/hubert/fisher-vad-10s-separated/kmeans/hubert_iter2/km500/dict.km.txt'

After trying to fix it with an external k-means dictionary file, I got dimension mismatch in the weights.

To Reproduce

Download the checkpoint files under Fisher HuBERT model and k-means model from Dialogue Speech-to-Unit Encoder for dGSLM: The Fisher HuBERT model
Create a Python script using the code snippet shown at the bottom of the readme file
Refer to these checkpoint files when passing the paths to HubertTokenizer
Include the path to a (dummy) audio file
Run Python script
See error

Note: I tried the other method as well (by using quantize_with_kmeans.py) but got the same error.

Trace:

Traceback (most recent call last):
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 144, in <module>
    main(args, logger)
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 101, in main
    features_batch = get_features(
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 73, in get_features
    generator, num_files = get_feature_iterator(
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 58, in get_feature_iterator
    reader = feature_reader_cls(
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 23, in __init__
    ) = fairseq.checkpoint_utils.load_model_ensemble_and_task(
  File "/home/user/fairseq/fairseq/checkpoint_utils.py", line 484, in load_model_ensemble_and_task
    model = task.build_model(cfg.model, from_checkpoint=True)
  File "/home/user/fairseq/fairseq/tasks/fairseq_task.py", line 355, in build_model
    model = models.build_model(cfg, self, from_checkpoint)
  File "/home/user/fairseq/fairseq/models/__init__.py", line 106, in build_model
    return model.build_model(cfg, task)
  File "/home/user/fairseq/fairseq/models/hubert/hubert.py", line 335, in build_model
    model = HubertModel(cfg, task.cfg, task.dictionaries)
  File "/home/user/fairseq/fairseq/tasks/hubert_pretraining.py", line 139, in dictionaries
    return self.state.dictionaries
  File "/home/user/fairseq/fairseq/tasks/fairseq_task.py", line 42, in __getattr__
    self._state[name] = self._factories[name]()
  File "/home/user/fairseq/fairseq/tasks/hubert_pretraining.py", line 149, in load_dictionaries
    dictionaries = [
  File "/home/user/fairseq/fairseq/tasks/hubert_pretraining.py", line 150, in <listcomp>
    Dictionary.load(f"{label_dir}/dict.{label}.txt")
  File "/home/user/fairseq/fairseq/data/dictionary.py", line 228, in load
    d.add_from_file(f)
  File "/home/user/fairseq/fairseq/data/dictionary.py", line 241, in add_from_file
    raise fnfe
  File "/home/user/fairseq/fairseq/data/dictionary.py", line 238, in add_from_file
    with open(PathManager.get_local_path(f), "r", encoding="utf-8") as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/ntuanh/experiments/dialogue-LM/hubert/fisher-vad-10s-separated/kmeans/hubert_iter2/km500/dict.km.txt'

Code sample

# Load the Hubert tokenizer
from examples.textless_nlp.dgslm.dgslm_utils import HubertTokenizer
encoder = HubertTokenizer(
    hubert_path = "/path/to/hubert_ckpt.pt",
    hubert_layer = 12,
    km_path = "path/to/km.bin"
)

# Encode the audio to units
path = "/path/to/stereo/audio.wav"
codes = encoder.wav2codes(path)

Expected behavior

I assume that the path to the k-means dictionary file should be a relative path and that the dictionary file should be included in the HuBERT model checkpoint itself.

Note: As a temporary measure, I tried to overwrite the path that causes the error with dictionary1 in Generative Spoken Dialogue Language Modeling which has exactly 500 elements. After this, I got another error:

Traceback (most recent call last):
  File "/home/user/hubert.py", line 6, in <module>
    encoder = HubertTokenizer(
  File "/home/user/fairseq/examples/textless_nlp/dgslm/dgslm_utils.py", line 27, in __init__
    self.feature_extractor = HubertFeatureReader(hubert_path, hubert_layer, use_cuda=use_cuda)
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 23, in __init__
    ) = fairseq.checkpoint_utils.load_model_ensemble_and_task(
  File "/home/user/fairseq/fairseq/checkpoint_utils.py", line 493, in load_model_ensemble_and_task
    model.load_state_dict(
  File "/home/user/fairseq/fairseq/models/fairseq_model.py", line 128, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/home/liviaq/miniconda3/envs/asr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for HubertModel:
  size mismatch for label_embs_concat: copying a param with shape torch.Size([504, 256]) from checkpoint, the shape in current model is torch.Size([1008, 256]).
  size mismatch for final_proj.weight: copying a param with shape torch.Size([256, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]).
  size mismatch for final_proj.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).

Based on this, I believe that there is also some dimension mismatch between the pre-trained weights and the actual model weights.

Environment

fairseq Version: main
PyTorch Version: 2.0.1
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): source
Build command you used (if compiling from source): pip install --editable ./
Python version: 3.9.12
CUDA/cuDNN version: 11.8
GPU models and configuration: 1 GPU (24 GB), Driver Version: 520.61.05

tuanh208 commented 1 year ago

I have updated the HuBERT checkpoint with the pre-defined dictionary. Please check if this works

qianlivia commented 1 year ago

I have updated the HuBERT checkpoint with the pre-defined dictionary. Please check if this works

Thank you, this solved it!

facebookresearch / fairseq