facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.57k stars 6.41k forks source link

wav2vec2-object of type 'NoneType' has no len() #5558

Open Linx3f opened 1 month ago

Linx3f commented 1 month ago

🐛 Bug(I have seen all issue about this error. But My situation is different.)

I follow the offical method (https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec) to use other datasets to fine-tune wav2vec_small_10m.pt. However, there is a TypeError: object of type 'NoneType' has no len()

Traceback (most recent call last):
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq_cli\hydra_train.py", line 27, in hydra_main
    _hydra_main(cfg)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq_cli\hydra_train.py", line 56, in _hydra_main
    distributed_utils.call_main(cfg, pre_main, **kwargs)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\distributed\utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq_cli\train.py", line 96, in main
    model = task.build_model(cfg.model)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\tasks\audio_finetuning.py", line 193, in build_model
    model = super().build_model(model_cfg, from_checkpoint)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\tasks\audio_pretraining.py", line 197, in build_model
    model = super().build_model(model_cfg, from_checkpoint)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\tasks\fairseq_task.py", line 338, in build_model
    model = models.build_model(cfg, self, from_checkpoint)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\models\__init__.py", line 106, in build_model
    return model.build_model(cfg, task)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\models\wav2vec\wav2vec2_asr.py", line 208, in build_model
    w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\models\wav2vec\wav2vec2_asr.py", line 407, in __init__
    model = task.build_model(w2v_args.model, from_checkpoint=True)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\tasks\audio_pretraining.py", line 197, in build_model
    model = super().build_model(model_cfg, from_checkpoint)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\tasks\fairseq_task.py", line 338, in build_model
    model = models.build_model(cfg, self, from_checkpoint)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\models\__init__.py", line 106, in build_model
    return model.build_model(cfg, task)
  File "D:\Apps\Anaconda3\envs\torch182\lib\site-packages\fairseq\models\wav2vec\wav2vec2_asr.py", line 208, in build_model
    w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
TypeError: object of type 'NoneType' has no len()

To Reproduce

  1. Get dict.ltr.txt of the datasets I used.
    | 345995
    E 149656
    T 126396
    O 126124
    A 105266
    I 98021
    N 87693
    H 78905
    S 75716
    R 65274
    L 56135
    U 53373
    Y 49207
    D 47160
    W 37983
    M 37210
    G 34109
    C 28421
    F 21527
    B 21383
    K 20813
    P 20423
    ' 19381
    V 12276
    J 4387
    X 1863
    Z 1067
    Q 597
  2. Modify the filebase_100h.yaml.The modified part is as follows.
    ...
    task:
    _name: audio_finetuning
    data: C:\Users\18310\Desktop\py\feature-extraction2\trans //only dict.ltr.txt there
    normalize: false
    labels: ltr
    model:
    _name: wav2vec_ctc
    w2v_path: C:\Users\18310\Desktop\py\feature-extraction2\model\wav2vec_small_10m.pt
    apply_mask: true
    ...
  3. Run cmd fairseq-hydra-train distributed_training.distributed_world_size=1 --config-dir C:\Users\18310\Desktop\py\feature-extraction2\config\finetuning --config-name base_100h .
  4. See error as above.

Expected behavior

fine-tune the wav2vec model.

Environment

Additional context

Well, I have another question. As written in the README.md, Fine-tuning a model requires parallel audio and labels file, as well as a vocabulary file in fairseq format, but why does the command line given only include the vocabulary file?