add support for huggingface wav2vec2 model. Support ASR training and …

add support for huggingface wav2vec2 model. Support ASR training and inference with default CTC. There is still a confusion. vocab file. The decoding result on an4 shows that "" in predictions, which should be "	" (space) tokens. The temporary solution is to swap the and space token in stage 5 (token_list). Which results in a performance:	dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
inference_asr_model_valid.acc.ave/test	130	773	98.8	0.9	0.3	0.0	1.2	5.4
inference_asr_model_valid.acc.ave/train_dev	100	591	97.5	1.7	0.8	0.2	2.7	12.0

Emrys365 / espnet