espnet / espnet

End-to-End Speech Processing Toolkit
https://espnet.github.io/espnet/
Apache License 2.0
8.4k stars 2.17k forks source link

espnet2.bin.asr_align.py: AttributeError: 'Namespace' object has no attribute 'token_list' #3934

Closed tjysdsg closed 2 years ago

tjysdsg commented 2 years ago

Describe the bug Error encountered when using espnet2.bin.asr_align.py to align a .wav file

Basic environments:

Task information:

set -e set -u set -o pipefail

log() { local fname=${BASH_SOURCE[1]##/} echo -e "$(date '+%Y-%m-%dT%H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $" }

stage=1 # Processes starts from the specified stage. stop_stage=10000 # Processes is stopped at the specified stage. train_set="train" val_set="test" test_sets="test" nj=1

asr_config=conf/tuning/train_asr_conformer_s3prlfrontend_wav2vec2.yaml exp_dir=exp/asr_train_asr_conformer_s3prlfrontend_wav2vec2_raw_word_sp out_dir=exp/asr_align_asr_conformer_s3prlfrontend_wav2vec2_raw_word_sp

log "$0 $*" . utils/parse_options.sh

if [ $# -ne 0 ]; then log "Error: No positional arguments are required." exit 2 fi

. ./path.sh . ./cmd.sh

mkdir -p ${out_dir}

${train_cmd} JOB=1:"${nj}" "${out_dir}"/asr_align.JOB.log \ python3 -m espnet2.bin.asr_align \ --asr_train_config ${asr_config} \ --asr_model_file ${exp_dir}/valid.acc.best.pth \ --fs 16000 \ --audio exp/SSB00050353.wav \ --text exp/text \ --output "${out_dir}/aligned.txt" || exit 1

 - ESPnet2

**Error logs**

/home/storage15/tangjiyang/espnet/tools/anaconda/envs/espnet/bin/python3 /home/storage15/tangjiyang/espnet/espnet2/bin/asr_align.py --asr_train_config conf/tuning/train_asr_conformer_s3prlfrontend_wav2vec2.yaml --asr_model_file exp/asr_train_asr_conformer_s3prlfrontend_wav2vec2_raw_word_sp/valid.acc.best.pth --fs 16000 --audio exp/SSB00050353.wav --text exp/text --output exp/asr_align_asr_conformer_s3prlfrontend_wav2vec2_raw_word_sp/aligned.txt Traceback (most recent call last): File "/home/storage15/tangjiyang/espnet/tools/anaconda/envs/espnet/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/storage15/tangjiyang/espnet/tools/anaconda/envs/espnet/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/storage15/tangjiyang/espnet/espnet2/bin/asr_align.py", line 827, in main() File "/home/storage15/tangjiyang/espnet/espnet2/bin/asr_align.py", line 823, in main ctc_align(kwargs) File "/home/storage15/tangjiyang/espnet/espnet2/bin/asr_align.py", line 632, in ctc_align aligner = CTCSegmentation(model, **kwargs) File "/home/storage15/tangjiyang/espnet/espnet2/bin/asr_align.py", line 235, in init asr_model, asr_train_args = ASRTask.build_model_from_file( File "/home/storage15/tangjiyang/espnet/espnet2/tasks/abs_task.py", line 1822, in build_model_from_file model = cls.build_model(args) File "/home/storage15/tangjiyang/espnet/espnet2/tasks/asr.py", line 377, in build_model if isinstance(args.token_list, str): AttributeError: 'Namespace' object has no attribute 'token_list'

tjysdsg commented 2 years ago

I printed out args in espnet2/tasks/abs_task.py, line 1822:

Namespace(
    accum_grad=4, batch_bins=4000000, batch_type='numel', best_model_criterion=[['valid', 'acc', 'max']],
    ctc_conf={'ignore_nan_grad': True}, decoder='transformer', decoder_conf={
        'attention_heads': 4, 'linear_units': 2048, 'num_blocks': 6, 'dropout_rate': 0.1,
        'positional_dropout_rate': 0.1, 'self_attention_dropout_rate': 0.0, 'src_attention_dropout_rate': 0.0
    },
    encoder='conformer',
    encoder_conf={
        'output_size': 256, 'attention_heads': 4, 'linear_units': 2048, 'num_blocks': 12, 'dropout_rate': 0.1,
        'positional_dropout_rate': 0.1, 'attention_dropout_rate': 0.0, 'input_layer': 'conv2d2',
        'normalize_before': True, 'macaron_style': True, 'rel_pos_type': 'latest', 'pos_enc_layer_type': 'rel_pos',
        'selfattention_layer_type': 'rel_selfattn', 'activation_type': 'swish', 'use_cnn_module': True,
        'cnn_module_kernel': 15
    },
    freeze_param=['frontend.upstream'], frontend='s3prl',
    frontend_conf={
        'frontend_conf': {'upstream': 'wav2vec2_large_ll60k'}, 'download_dir': './hub', 'multilayer_feature': True
    },
    init='xavier_uniform', keep_nbest_models=10, max_epoch=100, model_conf={
        'ctc_weight': 0.3, 'lsm_weight': 0.1, 'length_normalized_loss': False, 'extract_feats_in_collect_stats': False
    },
    num_workers=8, optim='adam', optim_conf={'lr': 0.005}, patience='none', preencoder='linear',
    preencoder_conf={
        'input_size': 1024, 'output_size': 80
    },
    scheduler='warmuplr',
    scheduler_conf={'warmup_steps': 30000},
    specaug='specaug',
    specaug_conf={
        'apply_time_warp': True, 'time_warp_window': 5, 'time_warp_mode': 'bicubic', 'apply_freq_mask': True,
        'freq_mask_width_range': [0, 30], 'num_freq_mask': 2, 'apply_time_mask': True, 'time_mask_width_range': [0, 40],
        'num_time_mask': 2
    }
)
lumaku commented 2 years ago

In your script:

asr_config=conf/tuning/train_asr_conformer_s3prlfrontend_wav2vec2.yaml

This should point to the config file of the already trained model, usually config.yaml or similar, e.g.:

asr_config=exp/asr_train_asr_conformer_s3prlfrontend_wav2vec2_raw_word_sp/config.yaml

This config file contains the token list as well as further model information.

tjysdsg commented 2 years ago
asr_train_asr_conformer_s3prlfrontend_wav2vec2_raw_word_sp

Thank you!