auspicious3000 / contentvec

speech self-supervised representations
MIT License
434 stars 32 forks source link

Inferencing a non-legacy model in the fairseq environment #21

Open misakiudon opened 5 months ago

misakiudon commented 5 months ago

hello, I am training a new contentvec model in order to replace the framework's hubert model with the newly trained contentvec.

However, when I tried to run the model created by learning with the code in the current repository on the fairseq system, the following problem occurred and inference was not possible.

INFO:__main__:Extracting hubert acoustic features...
Traceback (most recent call last):
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 141, in <module>
    main(args, logger)
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 98, in main
    features_batch = get_features(
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 73, in get_features
    generator, num_files = get_feature_iterator(
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 58, in get_feature_iterator
    reader = feature_reader_cls(
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 23, in __init__
    ) = fairseq.checkpoint_utils.load_model_ensemble_and_task(
  File "~/fairseq/fairseq/checkpoint_utils.py", line 461, in load_model_ensemble_and_task
    task = tasks.setup_task(cfg.task, from_checkpoint=True)
  File "~/fairseq/fairseq/tasks/__init__.py", line 44, in setup_task
    task is not None
AssertionError: Could not infer task type from {'_name': 'contentvec_pretraining', 'data': '~/contentvec/metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': '~/contentvec/label', 'label_rate': 50, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'crop': True, 'pad_audio': False, 'spk2info': '~/contentvec/metadata/output.dict'}. Available argparse tasks: dict_keys(['hubert_pretraining', 'speech_unit_modeling', 'translation', 'multilingual_translation', 'semisupervised_translation', 'audio_pretraining', 'nlu_finetuning', 'translation_lev', 'audio_finetuning', 'audio_classification', 'legacy_masked_lm', 'sentence_prediction', 'sentence_prediction_adapters', 'translation_from_pretrained_xlm', 'translation_from_pretrained_bart', 'denoising', 'speech_dlm_task', 'cross_lingual_lm', 'sentence_ranking', 'language_modeling', 'masked_lm', 'multilingual_language_modeling', 'speech_to_text', 'text_to_speech', 'multilingual_denoising', 'online_backtranslation', 'simul_speech_to_text', 'simul_text_to_text', 'multilingual_masked_lm', 'translation_multi_simple_epoch', 'frm_text_to_speech', 'speech_to_speech', 'span_masked_lm', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt']). Available hydra tasks: dict_keys(['hubert_pretraining', 'speech_unit_modeling', 'translation', 'audio_pretraining', 'nlu_finetuning', 'translation_lev', 'audio_finetuning', 'audio_classification', 'sentence_prediction', 'sentence_prediction_adapters', 'translation_from_pretrained_xlm', 'denoising', 'speech_dlm_task', 'language_modeling', 'masked_lm', 'multilingual_language_modeling', 'multilingual_denoising', 'simul_text_to_text', 'span_masked_lm', 'dummy_lm', 'dummy_masked_lm'])

I tried using the pretrained model provided in the system for check, but the normal version (checkpoint_best_500.pt) made the exactly same error with above, but the legacy version (checkpoint_best_500_legacy.pt) was working well.

Is there any way to solve this problem? (What code should I run to inference the model I created?)

And do you know how to train a contentvec model that only contains representation modules (a.k.a. legacy model)?

auspicious3000 commented 5 months ago

You need to use the fairseq version set up by this repo to run inference. The same fairseq framework you used to train the contentvec. That is why the legacy model is called the legacy model. Some modules were manually renamed or removed so that it can be loaded by the normal version fairseq. The legacy models are there for users to quickly run the model without setting up this repo. They are manually derived from the original version.