DeepSpeech2 error in interactive inference mode

mehulsuresh commented 5 years ago

Hey got this error while trying to run the DeepSpeech2 model in interactive infer mode and load the checkpoints from https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [21,11,32,32] rhs shape= [11,21,32,32]

I am using the ds2_large_mp.py config file and added the inference infer parameters as

interactive_infer_params = {
    "data_layer": Speech2TextDataLayer,
    "data_layer_params": {
        "num_audio_features": 160,
        "input_type": "spectrogram",
        "vocab_file": "open_seq2seq/test_utils/toy_speech_data/vocab.txt",
        "window_size":20e-3,
        "window_stride":10e-3,
        "dataset_files": [data_root + "/librispeech/librivox-dev-clean.csv",],
        "shuffle": False,
    },
}

also changed "num_gpus": 1,

borisgin commented 5 years ago

Can you try to comment out this line in config. file please? # "data_format": "BCFT"

mehulsuresh commented 5 years ago

@borisgin Thanks deepspeech 2 works perfectly now!

mehulsuresh commented 5 years ago

But i ran into another issue when trying to do the same for tacotron-gst

H:\NLP\OpenSeq2Seq\open_seq2seq\encoders\tacotron2_encoder.py in _encode(self, input_dict)
    161         if (self._model.get_data_layer().params.get("style_input", None)
    162             == "wav"):
--> 163           style_spec = input_dict['source_tensors'][2]
    164           style_len = input_dict['source_tensors'][3]
    165           style_embedding = self._embed_style(style_spec, style_len)

IndexError: list index out of range

interactive infer config is as follows

interactive_infer_params = {
    "data_layer_params": {
        "style_input": "wav",
        "vocab_file": "open_seq2seq/test_utils/toy_speech_data/vocab.txt",
        "dataset_files": ["generate.csv"],
        "duration_max":10000,
        "duration_min":0,
        "shuffle": False,
    },
}

blisc commented 5 years ago

tacotron_gst does not support interactive infer. If you want to add functionality, you can add a placeholder for your style spec, and style spec length in create_interactive_placeholders() function and add the relevant preprocessing in create_feed_dict(). Also interactive infer does not use dataset_files nor shuffle

NVIDIA / OpenSeq2Seq

DeepSpeech2 error in interactive inference mode #361