PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
11.1k stars 1.85k forks source link

训练一个英文字母识别的模型时,模型测试识别结果没有一个正确的 #3045

Open upcmb opened 1 year ago

upcmb commented 1 year ago

你好,我想训练一个英文字母识别的U2模型, 以下时我的配置: chunk_conformer.yaml: ############################################

Network Architecture

############################################ cmvn_file: cmvn_file_type: "json"

encoder related

encoder: conformer encoder_conf: output_size: 256 # dimension of attention attention_heads: 4 linear_units: 2048 # the number of units of position-wise feed forward num_blocks: 12 # the number of encoder blocks dropout_rate: 0.1 # sublayer output dropout positional_dropout_rate: 0.1 attention_dropout_rate: 0.0 input_layer: conv2d # encoder input type, you can chose conv2d, conv2d6 and conv2d8 normalize_before: True cnn_module_kernel: 15 use_cnn_module: True activation_type: 'swish' pos_enc_layer_type: 'rel_pos' selfattention_layer_type: 'rel_selfattn' causal: true use_dynamic_chunk: true cnn_module_norm: 'layer_norm' # using nn.LayerNorm makes model converge faster use_dynamic_left_chunk: false

decoder related

decoder: transformer decoder_conf: attention_heads: 4 linear_units: 2048 num_blocks: 6 dropout_rate: 0.1 # sublayer output dropout positional_dropout_rate: 0.1 self_attention_dropout_rate: 0.0 src_attention_dropout_rate: 0.0

hybrid CTC/attention

model_conf: ctc_weight: 0.3 lsm_weight: 0.1 # label smoothing option length_normalized_loss: false reverse_weight: 0.0 init_type: 'kaiming_uniform'

###########################################

Data

###########################################

train_manifest: data/manifest.train dev_manifest: data/manifest.dev test_manifest: data/manifest.test

###########################################

Dataloader

###########################################

vocab_filepath: data/lang_char/vocab.txt spm_model_prefix: 'data/lang_char/bpe_bpe_56' unit_type: 'spm' preprocess_config: conf/preprocess.yaml feat_dim: 80 stride_ms: 10.0 window_ms: 25.0 sortagrad: 0 # Feed samples from shortest to longest ; -1: enabled for all epochs, 0: disabled, other: enabled for 'other' epochs batch_size: 32 maxlen_in: 512 # if input length > maxlen-in, batchsize is automatically reduced maxlen_out: 150 # if output length > maxlen-out, batchsize is automatically reduced minibatches: 0 # for debug batch_count: auto batch_bins: 0 batch_frames_in: 0 batch_frames_out: 0 batch_frames_inout: 0 num_workers: 0 subsampling_factor: 1 num_encs: 1

###########################################

Training

########################################### n_epoch: 1000 accum_grad: 32 global_grad_clip: 5.0 dist_sampler: True optim: adam optim_conf: lr: 0.001 weight_decay: 1.0e-6 scheduler: warmuplr scheduler_conf: warmup_steps: 25000 lr_decay: 1.0 log_interval: 100 checkpoint: kbest_n: 50 latest_n: 5

data.sh

!/bin/bash

stage=-1 stop_stage=100 dict_dir=data/lang_char

bpemode (unigram or bpe)

nbpe=56 bpemode=bpe bpeprefix="${dictdir}/bpe${bpemode}_${nbpe}"

stride_ms=20 window_ms=30 sample_rate=16000 feat_dim=80

source ${MAIN_ROOT}/utils/parse_options.sh

mkdir -p data mkdir -p ${dict_dir} TARGET_DIR=${MAIN_ROOT}/dataset mkdir -p ${TARGET_DIR}

prepare data

if [ ${stage} -le -1 ] && [ ${stop_stage} -ge -1 ]; then if [ ! -d "${MAIN_ROOT}/dataset/alphabet_data/alphabet" ]; then echo "${MAIN_ROOT}/dataset/alphabet_data/alphabet does not exist. Please donwload alphabet data first." exit fi

create manifest json file

python ${MAIN_ROOT}/dataset/alphabet_data/alphabet_cs.py --target_dir ${MAIN_ROOT}/dataset/alphabet_data/alphabet/ --manifest_prefix data/

fi

if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then

compute mean and stddev for normalizer

num_workers=$(nproc)
python3 ${MAIN_ROOT}/utils/compute_mean_std.py \
--manifest_path="data/manifest.train.raw" \
--num_samples=-1 \
--spectrum_type="fbank" \
--feat_dim=${feat_dim}  \
--delta_delta=false \
--sample_rate=${sample_rate} \
--stride_ms=${stride_ms} \
--window_ms=${window_ms} \
--use_dB_normalization=False \
--num_workers=${num_workers} \
--output_path="data/mean_std.json"
echo "compute mean and stddev done."

fi

if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then

use train_set build dict

python3 ${MAIN_ROOT}/utils/build_vocab.py \
--unit_type 'spm' \
--count_threshold=0 \
--vocab_path="${dict_dir}/vocab.txt"  \
--manifest_paths="data/manifest.train.raw"  \
--spm_mode=${bpemode} \
--spm_vocab_size=${nbpe}  \
--spm_model_prefix=${bpeprefix} \
--spm_character_coverage=1 
echo "build dict done."

fi

use new dict format data

if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then

format manifest with tokenids, vocab size

for sub in train dev test ; do
{
    python3 ${MAIN_ROOT}/utils/format_data.py \
    --cmvn_path "data/mean_std.json" \
    --unit_type "spm" \
    --spm_model_prefix ${bpeprefix} \
    --vocab_path="${dict_dir}/vocab.txt" \
    --manifest_path="data/manifest.${sub}.raw" \
    --output_path="data/manifest.${sub}"

    if [ $? -ne 0 ]; then
        echo "Formt mnaifest failed. Terminated."
        exit 1
    fi
}&
done
wait
echo "format data done."

fi

数据是自己录得英文字母拼读音频,label.txt的结构是这样的: abcd a b c d alter a l t e r apple a p p l e azxs a z x s bacteria b a c t e r i a bcd b c d blast b l a s t breed b r e e d budget b u d g e t burst b u r s t campus c a m p u s candidate c a n d i d a t e consume c o n s u m e dergbnm d e r g b n m dfhgh d f h g h dfhji d f h j i dispose d i s p o s e

但最后测试的结果如下: Model Params Config Augmentation Test set Decode method Loss MER
conformer 47.63 M conf/conformer.yaml spec_aug test-set attention 9.85091028213501 0.102786
conformer 47.63 M conf/conformer.yaml spec_aug test-set ctc_greedy_search 9.85091028213501 0.103538
conformer 47.63 M conf/conformer.yaml spec_aug test-set ctc_prefix_beam_search 9.85091028213501 0.103317
conformer 47.63 M conf/conformer.yaml spec_aug test-set attention_rescoring 9.85091028213501 0.084374

几乎没有正确的: image

这是什么原因? 无论是spm还是char类型的都试了,也是一样的效果。 是参数配置的问题?

zxcd commented 1 year ago

看起来是字典问题。首先你需要确定你生成的字典跟你的需求是一样的。

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.