facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

Error Loading Downloaded Hubert Finetuned ASR #4003

Open j4sonzhao opened 2 years ago

j4sonzhao commented 2 years ago

Hi there,

I am trying to load the downloaded Hubert Large with 960hr finetuning from here: https://github.com/pytorch/fairseq/tree/main/examples/hubert

I downloaded the model, stored the checkpoint, and am trying to run

ckpt_path = "/path/to/the/checkpoint.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]

However, I am running into an error, where it is trying to load a dictionary from a incorrect location. It looks like it is some issue related to HubertPretrainingTask Config, where the default settings are wrong:

2021-11-08 17:21:38 | INFO | fairseq.tasks.hubert_pretraining | current directory is /home/jzhao7/speech-text-repr/fairseq
2021-11-08 17:21:38 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': '/checkpoint/abdo/old_checkpoint02/datasets/librispeech/960h/raw', 'fine_tuning': False, 'lab
els': ['lyr9.km500'], 'label_dir': '/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all', 'label_rate': 50, 'sample_rate': 16000, 'normalize': True, 'enable_padding': False, 'max_keep_size
': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/checkpoint_utils.py", line 465, in load_model_ensemble_and_task
    model = task.build_model(cfg.model)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/tasks/fairseq_task.py", line 320, in build_model
    model = models.build_model(cfg, self)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/models/__init__.py", line 107, in build_model
    return model.build_model(cfg, task)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/models/hubert/hubert_asr.py", line 152, in build_model
    w2v_encoder = HubertEncoder(cfg, task.target_dictionary)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/models/hubert/hubert_asr.py", line 287, in __init__
    model = task.build_model(w2v_args.model)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/tasks/fairseq_task.py", line 320, in build_model
    model = models.build_model(cfg, self)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/models/__init__.py", line 107, in build_model
    return model.build_model(cfg, task)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/models/hubert/hubert.py", line 320, in build_model
    model = HubertModel(cfg, task.cfg, task.dictionaries)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/tasks/hubert_pretraining.py", line 142, in dictionaries
    return self.state.dictionaries
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/tasks/fairseq_task.py", line 42, in __getattr__
    self._state[name] = self._factories[name]()
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/tasks/hubert_pretraining.py", line 152, in load_dictionaries
    dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels]
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/tasks/hubert_pretraining.py", line 152, in <listcomp>
    dictionaries = [Dictionary.load(f"{label_dir}/dict.{label}.txt") for label in self.cfg.labels]
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/data/dictionary.py", line 226, in load
    d.add_from_file(f)
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/data/dictionary.py", line 239, in add_from_file
    raise fnfe
  File "/home/jzhao7/speech-text-repr/fairseq/fairseq/data/dictionary.py", line 236, in add_from_file
    with open(PathManager.get_local_path(f), "r", encoding="utf-8") as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/wnhsu/experiments/hubert/kmeans_20210121/km_dataset_librivox.model_iter_2.all/dict.lyr9.km500.txt'

I am confused though; why does hubert need these paths in the first place just to initialize itself, and how can I change this?

In general, I am trying to decode asr with hubert, and this is the specific issue I am running into.

j4sonzhao commented 2 years ago

It looks like the state dictionary was not saved properly, and so it cannot be loaded.

@wnhsu would appreciate help on how to load this! I've been trying to modify the config file in the state dictionary (state["config"]) to point to the correct data dictionary, but I keep running into issues.

wnhsu commented 2 years ago

@j4sonzhao thanks for pointing out the bug!

To answer your question about why it needs the path, it was because it needs the dictionary to determine the output size (how many clusters are there) when initializing the pre-trained HuBERT (though that prediction head is not used in the fine-tuning stage...)

We will fix that to make it not load the pre-training dictionary. In the meanwhile, if we want a quick fix, you can do the following:

  1. create a dummy dict.lyr9.km500.txt in some directory, say /tmp/hubert_labels/dict.lyr9.km500.txt, which contains 500 lines as follows
    0 1
    1 1
    ...
    499 1
  2. modify the downloaded checkpoint by running the follows
    import torch
    state = torch.load(old_checkpoint_path)
    state['cfg']['model']['w2v_args']['task']['label_dir'] = "/tmp/hubert_labels"
    torch.save(state, new_checkpoint_path)

Now the new_checkpoint_path should work with fairseq.checkpoint_utils.load_model_ensemble_and_task([new_checkpoint_path])

wnhsu commented 2 years ago

The current branch with [this commit] (https://github.com/pytorch/fairseq/commit/272c4c5197250997148fb12c0db6306035f166a4) should fix the bug of requiring to load pre-training dict when loading a fine-tuned checkpoint, and several issues introduced by recent commits (miss config, etc.)