Deeperjia / tensorflow-wavenet

speech recognition based on tensorflow 1.0.0
139 stars 71 forks source link

issue with the "test.py" #1

Open eisneim opened 7 years ago

eisneim commented 7 years ago

Hi @Deeperjia , first, this is a great project, it's really help, thank you for your great work.

there is one issue: in test.py SpeechLoader initialized without label_file

speech_loader = SpeechLoader()

will cause utils.py complain about no file to decode

    self.preprocess(wav_path, label_file, wavs_file, vocab_file, mfcc_tensor, label_tensor)
  File ".../utils.py", line 54, in preprocess
    with codecs.open(label_file,"r", encoding=self.encoding) as f:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 895, in open
    file = builtins.open(filename, mode, buffering)
TypeError: expected str, bytes or os.PathLike object, not NoneType

and if add label_file path, without wav_path for SpeechLoader()

Traceback (most recent call last):
  File "test.py", line 60, in <module>
    speech_to_text()
  File "test.py", line 19, in speech_to_text
    speech_loader = SpeechLoader(label_file=label_file)
  File "...../utils.py", line 34, in __init__
    self.preprocess(wav_path, label_file, wavs_file, vocab_file, mfcc_tensor, label_tensor)
  File "..../utils.py", line 88, in preprocess
    self.wav_max_len = max(len(mfcc) for mfcc in self.mfcc_tensor)
ValueError: max() arg is an empty sequence
eisneim commented 7 years ago

after some experiments, i got unusable results even after 100 epochs

latest ckpt: /Users/eisneim/www/deepLearning/deeperjia_tensorflow-wavenet_cn/model/speech.module-10
==> restore model costs: 8.079950094223022s
---------------------------
Input: /Volumes/raid/_deeplearning/THCHS30语音/wav/test/D4/D4_750.wav
Output: 苏美军的一下爱国想市马债山一动党军乎苏名外断全没等也奋体逃战
---------------------------
Input: /Volumes/raid/_deeplearning/THCHS30语音/wav/test/D4/D4_751.wav
Output: 王英看北香边后不分云面三产气来几似为组终为抓果
---------------------------
Input: /Volumes/raid/_deeplearning/THCHS30语音/wav/test/D4/D4_752.wav
Output: 他们种大斯喜路一家茶顺不里按就说药软义鱼语她增把了又被扎来的频繁
---------------------------
Input: /Volumes/raid/_deeplearning/THCHS30语音/wav/test/D4/D4_753.wav
Output: 几百来没纹书经少在订小学钉响也将于些江外把位军享就处昏线的人
---------------------------
Input: /Volumes/raid/_deeplearning/THCHS30语音/wav/test/D4/D4_754.wav
Output: 待得肉丝个可等柳根怎疼团初姆毛有王恶末奥会次原是丰飘饭伞感问秋区表事韦启好种行机细高合许期底长登算港民名根菱银据同知秋顺半百人窃用设的释皮音身高了哑不迪请急比屏七八西水市尼场胜倒崔造厌交腔国岸工你本出多放亭的着员引已锐给不业谊密服混名年议腰遵耀畜猛马偏墙筒顺级既从嘉贤从偏结难软然性

just wondering what's the result on your side? it would be really useful if a evaluation script is provide.

Deeperjia commented 7 years ago

Thanks for your focus. The result I got is nearly same as yours. Since I did not debug parameters carefully

dllen commented 6 years ago

@eisneim 我也在测试的时候遇到了问题,能否把你的test.py发出来参考一下.谢谢!

arixlin commented 6 years ago

@dllen you can loaddata: speech_loader = SpeechLoader(wav_path='data/wav/train', label_file='data/doc/trans/train.word.txt', n_mfcc=60) but no found key: Not found: Key conv1d10/variance/Adam not found in checkpoint

arixlin commented 6 years ago

@dllen https://github.com/arixlin/tensorflow-wavenet 我修复了一些BUG 你可以参考我的

finebck commented 6 years ago

@arixlin 感谢,参考了你的代码,跑了一下,我还没调参感觉效果不怎么好,你后期的效果好吗??

arixlin commented 6 years ago

@finebck 这只是wave的 如果你要做语音识别的话, 还要结合HMM 或LSTM 做NLP

finebck commented 6 years ago

@arixlin 也就是还要完善语言模型?,您对这块有什么建议吗?

czifan commented 6 years ago

你好 我用这个代码训练出来的模型预测总是为空 之后我采用另外的一个框架重写代码训练出来也是如此 想问下这个模型是有这个特点还是我哪里弄错了~

NPCv7 commented 5 months ago

你好 我用这个代码训练出来的模型预测总是为空 之后我采用另外的一个框架重写代码训练出来也是如此 想问下这个模型是有这个特点还是我哪里弄错了~

朋友你最后咋解决的