buriburisuri / speech-to-text-wavenet

Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow
Apache License 2.0
3.95k stars 794 forks source link

Docker image: scikits.audiolab missing #78

Open greigs opened 7 years ago

greigs commented 7 years ago

running in nvidia-docker...

root@94a7bbb1c7db:~/speech-to-text-wavenet# python preprocess.py 
Traceback (most recent call last):
  File "preprocess.py", line 6, in <module>
    import scikits.audiolab
ImportError: No module named scikits.audiolab

tried pyhton3 as well..

root@94a7bbb1c7db:~/speech-to-text-wavenet# python3 preprocess.py 
Traceback (most recent call last):
  File "preprocess.py", line 1, in <module>
    import numpy as np
ImportError: No module named 'numpy'

I was able to run python recognize.py successfully, though.

rachel-bousfield commented 7 years ago

I figured it out, you have to install libsndfile-dev.

Add this under requirements in your docker file

# requirements
....
RUN apt-get install -y libsndfile-dev
RUN pip install scikits.audiolab==0.11.0
gdahlm commented 7 years ago

This is breaking on Python3.6, but as you are using librosa which uses soundfile the number of dependancies could be reduced by this change.

index 5e4575e..67045e7 100755
--- a/preprocess.py
+++ b/preprocess.py
@@ -3,7 +3,7 @@ import pandas as pd
 import glob
 import csv
 import librosa
-import scikits.audiolab
+import soundfile as sf
 import data
 import os
 import subprocess
@@ -115,7 +115,7 @@ def process_libri(csv_file, category):
         print("LibriSpeech corpus preprocessing (%d / %d) - '%s']" % (i, len(wave_files), wave_file))

         # load flac file
-        wave, sr, _ = scikits.audiolab.flacread(wave_file)
+        wave, sr = sf.read(wave_file)

         # get mfcc feature
         mfcc = librosa.feature.mfcc(wave, sr=16000)

Note I also personally saw an increase in preprocessing performance by making this change which is also breaking under python3

index df95b2c..18f75b0 100755
--- a/data.py
+++ b/data.py
@@ -33,8 +33,10 @@ def str2index(str_):

     # clean white space
     str_ = ' '.join(str_.split())
+    #make translator object
+    translator=str.maketrans('','',string.punctuation)
     # remove punctuation and make lower case
-    str_ = str_.translate(None, string.punctuation).lower()
+    str_ = str_.translate(translator).lower()

     res = []
     for ch in str_:

I am mostly just using your preprocessing scripts right now but if end up making major changes I will file a pull request for those improvements.