Inferencing a non-legacy model in the fairseq environment

hello, I am training a new contentvec model in order to replace the framework's hubert model with the newly trained contentvec.

However, when I tried to run the model created by learning with the code in the current repository on the fairseq system, the following problem occurred and inference was not possible.

INFO:__main__:Extracting hubert acoustic features...
Traceback (most recent call last):
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 141, in <module>
    main(args, logger)
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py", line 98, in main
    features_batch = get_features(
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 73, in get_features
    generator, num_files = get_feature_iterator(
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py", line 58, in get_feature_iterator
    reader = feature_reader_cls(
  File "~/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/hubert_feature_reader.py", line 23, in __init__
    ) = fairseq.checkpoint_utils.load_model_ensemble_and_task(
  File "~/fairseq/fairseq/checkpoint_utils.py", line 461, in load_model_ensemble_and_task
    task = tasks.setup_task(cfg.task, from_checkpoint=True)
  File "~/fairseq/fairseq/tasks/__init__.py", line 44, in setup_task
    task is not None
AssertionError: Could not infer task type from {'_name': 'contentvec_pretraining', 'data': '~/contentvec/metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': '~/contentvec/label', 'label_rate': 50, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'crop': True, 'pad_audio': False, 'spk2info': '~/contentvec/metadata/output.dict'}. Available argparse tasks: dict_keys(['hubert_pretraining', 'speech_unit_modeling', 'translation', 'multilingual_translation', 'semisupervised_translation', 'audio_pretraining', 'nlu_finetuning', 'translation_lev', 'audio_finetuning', 'audio_classification', 'legacy_masked_lm', 'sentence_prediction', 'sentence_prediction_adapters', 'translation_from_pretrained_xlm', 'translation_from_pretrained_bart', 'denoising', 'speech_dlm_task', 'cross_lingual_lm', 'sentence_ranking', 'language_modeling', 'masked_lm', 'multilingual_language_modeling', 'speech_to_text', 'text_to_speech', 'multilingual_denoising', 'online_backtranslation', 'simul_speech_to_text', 'simul_text_to_text', 'multilingual_masked_lm', 'translation_multi_simple_epoch', 'frm_text_to_speech', 'speech_to_speech', 'span_masked_lm', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt']). Available hydra tasks: dict_keys(['hubert_pretraining', 'speech_unit_modeling', 'translation', 'audio_pretraining', 'nlu_finetuning', 'translation_lev', 'audio_finetuning', 'audio_classification', 'sentence_prediction', 'sentence_prediction_adapters', 'translation_from_pretrained_xlm', 'denoising', 'speech_dlm_task', 'language_modeling', 'masked_lm', 'multilingual_language_modeling', 'multilingual_denoising', 'simul_text_to_text', 'span_masked_lm', 'dummy_lm', 'dummy_masked_lm'])

I tried using the pretrained model provided in the system for check, but the normal version (checkpoint_best_500.pt) made the exactly same error with above, but the legacy version (checkpoint_best_500_legacy.pt) was working well.

Is there any way to solve this problem? (What code should I run to inference the model I created?)

And do you know how to train a contentvec model that only contains representation modules (a.k.a. legacy model)?

auspicious3000 / contentvec

Inferencing a non-legacy model in the fairseq environment #21