facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.17k stars 6.37k forks source link

Dialogue Speech-to-Unit Encoder for dGSLM - checkpoint issues #5195

Closed qianlivia closed 1 year ago

qianlivia commented 1 year ago

🐛 Bug

I tried to load the checkpoints (Fisher HuBERT model and k-means model) from

Dialogue Speech-to-Unit Encoder for dGSLM: The Fisher HuBERT model

and got the following error:

No such file or directory: '/checkpoint/ntuanh/experiments/dialogue-LM/hubert/fisher-vad-10s-separated/kmeans/hubert_iter2/km500/dict.km.txt'

After trying to fix it with an external k-means dictionary file, I got dimension mismatch in the weights.

To Reproduce

  1. Download the checkpoint files under Fisher HuBERT model and k-means model from Dialogue Speech-to-Unit Encoder for dGSLM: The Fisher HuBERT model
  2. Create a Python script using the code snippet shown at the bottom of the readme file
  3. Refer to these checkpoint files when passing the paths to HubertTokenizer
  4. Include the path to a (dummy) audio file
  5. Run Python script
  6. See error

Note: I tried the other method as well (by using quantize_with_kmeans.py) but got the same error.

Trace:

Traceback (most recent call last):
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 144, in <module>
    main(args, logger)
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 101, in main
    features_batch = get_features(
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 73, in get_features
    generator, num_files = get_feature_iterator(
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 58, in get_feature_iterator
    reader = feature_reader_cls(
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 23, in __init__
    ) = fairseq.checkpoint_utils.load_model_ensemble_and_task(
  File "/home/user/fairseq/fairseq/checkpoint_utils.py", line 484, in load_model_ensemble_and_task
    model = task.build_model(cfg.model, from_checkpoint=True)
  File "/home/user/fairseq/fairseq/tasks/fairseq_task.py", line 355, in build_model
    model = models.build_model(cfg, self, from_checkpoint)
  File "/home/user/fairseq/fairseq/models/__init__.py", line 106, in build_model
    return model.build_model(cfg, task)
  File "/home/user/fairseq/fairseq/models/hubert/hubert.py", line 335, in build_model
    model = HubertModel(cfg, task.cfg, task.dictionaries)
  File "/home/user/fairseq/fairseq/tasks/hubert_pretraining.py", line 139, in dictionaries
    return self.state.dictionaries
  File "/home/user/fairseq/fairseq/tasks/fairseq_task.py", line 42, in __getattr__
    self._state[name] = self._factories[name]()
  File "/home/user/fairseq/fairseq/tasks/hubert_pretraining.py", line 149, in load_dictionaries
    dictionaries = [
  File "/home/user/fairseq/fairseq/tasks/hubert_pretraining.py", line 150, in <listcomp>
    Dictionary.load(f"{label_dir}/dict.{label}.txt")
  File "/home/user/fairseq/fairseq/data/dictionary.py", line 228, in load
    d.add_from_file(f)
  File "/home/user/fairseq/fairseq/data/dictionary.py", line 241, in add_from_file
    raise fnfe
  File "/home/user/fairseq/fairseq/data/dictionary.py", line 238, in add_from_file
    with open(PathManager.get_local_path(f), "r", encoding="utf-8") as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/ntuanh/experiments/dialogue-LM/hubert/fisher-vad-10s-separated/kmeans/hubert_iter2/km500/dict.km.txt'

Code sample

# Load the Hubert tokenizer
from examples.textless_nlp.dgslm.dgslm_utils import HubertTokenizer
encoder = HubertTokenizer(
    hubert_path = "/path/to/hubert_ckpt.pt",
    hubert_layer = 12,
    km_path = "path/to/km.bin"
)

# Encode the audio to units
path = "/path/to/stereo/audio.wav"
codes = encoder.wav2codes(path)

Expected behavior

I assume that the path to the k-means dictionary file should be a relative path and that the dictionary file should be included in the HuBERT model checkpoint itself.

Note: As a temporary measure, I tried to overwrite the path that causes the error with dictionary1 in Generative Spoken Dialogue Language Modeling which has exactly 500 elements. After this, I got another error:

Traceback (most recent call last):
  File "/home/user/hubert.py", line 6, in <module>
    encoder = HubertTokenizer(
  File "/home/user/fairseq/examples/textless_nlp/dgslm/dgslm_utils.py", line 27, in __init__
    self.feature_extractor = HubertFeatureReader(hubert_path, hubert_layer, use_cuda=use_cuda)
  File "/home/user/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 23, in __init__
    ) = fairseq.checkpoint_utils.load_model_ensemble_and_task(
  File "/home/user/fairseq/fairseq/checkpoint_utils.py", line 493, in load_model_ensemble_and_task
    model.load_state_dict(
  File "/home/user/fairseq/fairseq/models/fairseq_model.py", line 128, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/home/liviaq/miniconda3/envs/asr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for HubertModel:
  size mismatch for label_embs_concat: copying a param with shape torch.Size([504, 256]) from checkpoint, the shape in current model is torch.Size([1008, 256]).
  size mismatch for final_proj.weight: copying a param with shape torch.Size([256, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]).
  size mismatch for final_proj.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).

Based on this, I believe that there is also some dimension mismatch between the pre-trained weights and the actual model weights.

Environment

tuanh208 commented 1 year ago

I have updated the HuBERT checkpoint with the pre-defined dictionary. Please check if this works

qianlivia commented 1 year ago

I have updated the HuBERT checkpoint with the pre-defined dictionary. Please check if this works

Thank you, this solved it!