astorfi / 3D-convolutional-speaker-recognition

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
Apache License 2.0
782 stars 274 forks source link

ValueError: Convolution expects input with rank 4, got 5 #38

Closed 8rV1n closed 5 years ago

8rV1n commented 6 years ago

When I run the run.sh, it shows something wrong:

/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Train data shape: (12, 80, 40, 20)Train label shape: (12,)Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
Traceback (most recent call last):  File "./code/1-development/train_softmax.py", line 602, in <module>    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/1-development/train_softmax.py", line 414, in main
    logits, end_points_speech = model_speech_fn(batch_speech[i * step: (i + 1) * step])  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/code/1-development/nets/nets_factory.py", line 59, in network_fn
    return func(images, num_classes, is_training=is_training)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/code/1-development/nets/cnn_speech.py", line 118, in speech_cnn
    net = slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
    conv_dims=2)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1025, in convolution
    (conv_dims + 2, input_rank))
ValueError: Convolution expects input with rank 4, got 5
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
  File "./code/2-enrollment/enrollment.py", line 330, in <module>
    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/2-enrollment/enrollment.py", line 201, in main
    for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/enrollment-evaluation_sample_dataset.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
  File "./code/3-evaluation/evaluation.py", line 380, in <module>
    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/3-evaluation/evaluation.py", line 202, in main
    for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/enrollment-evaluation_sample_dataset.hdf5...donedata/development_sample_dataset_speaker.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/calculate_roc.py", line 23, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotPR.py", line 58, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
imranparuk commented 6 years ago

I'm getting this error also

8rV1n commented 6 years ago

@imranparuk Generally, I think it's the new APIs of tensorflow that differs from the one of the coders'. I tried to use conv3d instead of conv2d (both the paper and the readme says a 3D-conv is used) in the all cnn_speech.py files and it worked partly. I think @astorfi should specify a tensorflow version to us or update the code for the new version of tf framework.

imranparuk commented 6 years ago

I have tried multiple older versions of TF and this code base, no combination seems to be working. I have also tried the conv3d but the shape of the nets don't correspond to what is written in the paper. It fails because of incorrect dimensions that go into tf.squeeze. Would be great if the author helps resolve the issue.

8rV1n commented 6 years ago

@imranparuk I‘ve just removed the tf.squeeze lines as I've said PARTLY working. I guess that those squeeze lines may just be used for normalizing the extra 1s in the shape( removing (2,3,1) -> (2,3)). So it should be handled by the conv3d or something. By the way, I've not finished reading the paper.

imranparuk commented 6 years ago

I think I solved it. Use an older version of tensorflow, and uninstall your current version of numpy. Let tensorflow install the numpy version it requires. I tried Tensorflow V1.0.0

8rV1n commented 6 years ago

@imranparuk Wow, It seems from a version of acient times. By the way, I'm planning to rewrite the code by using keras as the paper as possible. It should be out till the 3rd, Sep.

imranparuk commented 6 years ago

@ArvinSiChuan I would contribute to that.

astorfi commented 6 years ago

@ArvinSiChuan @imranparuk Thank you all for your contribution. I am trying to do a Pytorch version of this code as well, using a public dataset. Will inform you all in this repo when/if I finished it. Thanks

8rV1n commented 6 years ago

@astorfi Which dataset would you want to use? I would think the mvu multimodal dataset reported in the paper isn't the one of public choices. And I'm finding datasets suit for this task. If you just give some, it would be perfect!

imranparuk commented 6 years ago

@astorfi @ArvinSiChuan , I'm also finding it very difficult to get the multi-modal dataset from the paper. Would be great if we could use a open source dataset.

8rV1n commented 6 years ago

@imranparuk I'm considering the datasets in openslr. The TED-LIUM and Free ST Chinese Mandarin Corpus may be usefult but also need to be well preprocessed. Hope @astorfi could give us some suggestion to this.

astorfi commented 6 years ago

Perhaps VoxCeleb dataset is one of the best options.

astorfi commented 6 years ago

@ArvinSiChuan @imranparuk I agree that one of the problems is the dataset is restricted. It takes a lot of effort for me to tune it for a new dataset as I am not working on this project anymore.

imranparuk commented 6 years ago

@astorfi I have actually been using the VoxCeleb dataset, I wanted to try using the Mozilla Open Voice. However it seems to require conversion from mp3 to wav. Just lazy to do that.

My question is, is it illegal or unethical to include these datasets in your own repository since they all require you to formally request them?

8rV1n commented 6 years ago

@imranparuk From my point of view, with the citing of the author and the following of the license(CC BY-SA 4.0 for VoxCeleb), we can use the dataset in our experiements or something. I would like to explain which and how a dataset is used rather including the real dataset files in a repo.

8rV1n commented 6 years ago

@imranparuk @astorfi I'm now doing mfec at Here. keras, docs and etc will be comig soon. Would you help me finding bugs or some thing?

astorfi commented 6 years ago

@ArvinSiChuan Thank you for your effort. Sure. I will be more than happy to help. For SpeechPy please refer to it's newly published technical report.

@article{torfi2018speechpy, title={SpeechPy-A Library for Speech Processing and Recognition}, author={Torfi, Amirsina}, journal={arXiv preprint arXiv:1803.01094}, year={2018} }

8rV1n commented 6 years ago

@astorfi I have some problem about the model. I've built the model with the structure like:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input-layer (InputLayer)     (None, 20, 80, 40, 1)     0         
_________________________________________________________________
conv1-1 (Conv3D)             (None, 18, 80, 36, 16)    256       
_________________________________________________________________
activation1-1 (PReLU)        (None, 18, 80, 36, 16)    829440    
_________________________________________________________________
conv1-2 (Conv3D)             (None, 16, 36, 36, 16)    6928      
_________________________________________________________________
activation1-2 (PReLU)        (None, 16, 36, 36, 16)    331776    
_________________________________________________________________
pool-1 (MaxPooling3D)        (None, 16, 36, 18, 16)    0         
_________________________________________________________________
conv2-1 (Conv3D)             (None, 14, 36, 15, 16)    3088      
_________________________________________________________________
activation2-1 (PReLU)        (None, 14, 36, 15, 16)    120960    
_________________________________________________________________
conv2-2 (Conv3D)             (None, 12, 15, 15, 16)    6160      
_________________________________________________________________
activation2-2 (PReLU)        (None, 12, 15, 15, 16)    43200     
_________________________________________________________________
pool-2 (MaxPooling3D)        (None, 12, 15, 7, 16)     0         
_________________________________________________________________
conv3-1 (Conv3D)             (None, 10, 15, 5, 16)     2320      
_________________________________________________________________
activation3-1 (PReLU)        (None, 10, 15, 5, 16)     12000     
_________________________________________________________________
conv3-2 (Conv3D)             (None, 8, 9, 5, 16)       5392      
_________________________________________________________________
activation3-2 (PReLU)        (None, 8, 9, 5, 16)       5760      
_________________________________________________________________
conv4-1 (Conv3D)             (None, 6, 9, 3, 16)       2320      
_________________________________________________________________
activation4-1 (PReLU)        (None, 6, 9, 3, 16)       2592      
_________________________________________________________________
conv4-2 (Conv3D)             (None, 4, 3, 3, 16)       5392      
_________________________________________________________________
activation4-2 (PReLU)        (None, 4, 3, 3, 16)       576       
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
fc (Dense)                   (None, 128)               73856     
_________________________________________________________________
ac_softmax (Dense)           (None, 695)               89655     
=================================================================
Total params: 1,541,671
Trainable params: 1,541,671
Non-trainable params: 0
_________________________________________________________________

Is this structure the same as yours? Or my input feature pipline is incorrect? Or something with the optimizer? I've got a zero acc in Keras evaluation. Could you help me finding out what's the problem there?

sivagururaman commented 6 years ago

After moving to Tensorflow v1.0.0, I could get past the training. But the enrollment is failing: Traceback (most recent call last): File "./code/2-enrollment/enrollment.py", line 330, in tf.app.run() File "/home/osboxes/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "./code/2-enrollment/enrollment.py", line 201, in main for i in xrange(FLAGS.num_clones): NameError: name 'xrange' is not defined

Any idea why this is happening?

Thanks!

8rV1n commented 6 years ago

@sivagururaman The error there explained the problem is xrange, which is not compatible in python 3.x.

sivagururaman commented 6 years ago

@ArvinSiChuan, Do i need to move to someother python version? Something like 2.7?

8rV1n commented 6 years ago

@sivagururaman Yes, that's one way. You can also try to use range() instead.

sivagururaman commented 6 years ago

@ArvinSiChuan Okay. Thanks. WIll try the suggestion and get back if I get stuck somewhere else.

sivagururaman commented 6 years ago

@ArvinSiChuan Thanks! It worked... The demo is fine.

Now how do I go about using our own data set for training, enrollment and evaluation? Our problem space is sub set to this, I suppose. As we need only text dependent speaker_id..... Any thoughts here would be helpful..

sivagururaman commented 6 years ago

@ArvinSiChuan I am now able to run the demo as needed.

Do you have the input preprocess of wav file handy which I can use? i wanted to extract the input feature of my own clip and use the rest of the n/w for the evaluation.

8rV1n commented 6 years ago

@sivagururaman You could refer to input part in the code folder.

sivagururaman commented 6 years ago

@ArvinSiChuan Thanks. I am going through that now....

sivagururaman commented 6 years ago

@ArvinSiChuan One quick question: What do the hdf5 files in the data/ folder contain? Are they the extracted speech features from the input? If so, how do i then convert the MFEC vectors obtained from the input code to the hdf5 format?

imranparuk commented 6 years ago

@sivagururaman good question. The input portion of the code seems incomplete, we should take some initiative and complete it... eventually. Try something like this (but better) maybe?

    datasetTest = AudioDataset(files_path='/home/imran/Documents/projects/wd/wd-3d-cnn/code/0-input/file_path_enrollment_eval.txt', audio_dir=args.audio_dir,
                           transform=Compose([CMVN(), Feature_Cube(cube_shape=(20, 80, 40), augmentation=True), ToOutput()]))

    datasetTrain = AudioDataset(files_path='/home/imran/Documents/projects/wd/wd-3d-cnn/code/0-input/file_path_enrollment_enroll.txt', audio_dir=args.audio_dir,
                           transform=Compose([CMVN(), Feature_Cube(cube_shape=(20, 80, 40), augmentation=True), ToOutput()]))
    # idx is the representation of the batch size which chosen to be as one sample (index) from the data.
    # ex: batch_features = [dataset.__getitem__(idx)[0] for idx in range(32)]
    # The batch_features is a list and len(batch_features)=32.

    lengthTest = datasetTest.__len__()
    lengthTrain = datasetTrain.__len__()

    out_array_features_train = list()

    fileh = tables.open_file('da_dataset.h5', mode='w')
    a = tables.Float32Atom()
    b = tables.Int32Atom()

    array_a = fileh.create_earray(fileh.root, 'label_evaluation', b, (0,))
    array_b = fileh.create_earray(fileh.root, 'label_enrollment', b, (0,))
    array_c = fileh.create_earray(fileh.root, 'utterance_evaluation', a, (0, 80, 40, 1))
    array_d = fileh.create_earray(fileh.root, 'utterance_enrollment', a, (0, 80, 40, 1))

    for x in range(0, lengthTest):
        feature, label = datasetTest.__getitem__(x)
        feature = feature.swapaxes(1, 2).swapaxes(2, 3)
        feature = feature[:,:,:,0:1]
        feature = np.squeeze(np.array(feature), axis=0)
        print(feature.shape)
        array_a.append(np.array([label]))
        array_c.append(np.array([feature]))

    for x in range(0, lengthTrain):
        feature, label = datasetTrain.__getitem__(x)
        feature = feature.swapaxes(1, 2).swapaxes(2, 3)
        feature = feature[:, :, :, 0:1]
        feature = np.squeeze(np.array(feature), axis=0)
        array_b.append(np.array([label]))
        array_d.append(np.array([feature]))

    # close the file...
    fileh.close()
sivagururaman commented 6 years ago

@imranparuk I could not test this input code. Will do so in coming days and let you know of the updates.

Just after a glance of the code: I understand that we would have da_datast.h5 which will have the enrollment and evaluation data. if I need to test with another sample say new utterance.wav, how do i do that?

imranparuk commented 6 years ago

@sivagururaman Another good question... I wrote my own prediction code to do that but it's too long to post here. A simpler way would be to create a dataset file the same way but the text file only has 1 item... Then pass it to the model in a similar way.

I place the task of posting the code to another user...

sivagururaman commented 6 years ago

@Imran - did you mean test file instead of text file?

Regards, Sivagururaman

On Mon, 1 Oct 2018 at 6:53 PM Imran Paruk notifications@github.com wrote:

Another good question... I wrote my own prediction code to do that but it's too long to post here. A simpler way would be to create a dataset file the same way but the text file only has 1 item... Then pass it to the model in a similar way.

I place the task of posting the code to another user...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/astorfi/3D-convolutional-speaker-recognition/issues/38#issuecomment-425905822, or mute the thread https://github.com/notifications/unsubscribe-auth/AmZiiGV_zaGdvvegpZoNOtDirZK9C1hsks5ughdfgaJpZM4Vn8OC .

-- Regards, Sivam.

imranparuk commented 6 years ago

@sivagururaman no, there is a text file provided with the git repo which has a particular format to identify speakers.

0 file1.wav
1 file2.wav
1 file3.wav
2 file4.wav

etc

sivagururaman commented 6 years ago

@Imran - Thanks. Will go over the format and see if that helps our run here.

Thanks!

On Thu, Oct 4, 2018 at 3:50 AM Imran Paruk notifications@github.com wrote:

@sivagururaman https://github.com/sivagururaman no, there is a text file provided with the git repo which has a particular format to identify speakers.

0 file1.wav 1 file2.wav 1 file3.wav 2 file4.wav

etc

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/astorfi/3D-convolutional-speaker-recognition/issues/38#issuecomment-426823027, or mute the thread https://github.com/notifications/unsubscribe-auth/AmZiiDhlEm-9t8P-sEYX8i4UsGs23HFpks5uhTgggaJpZM4Vn8OC .

sivagururaman commented 6 years ago

@Imran - If you donot mind, can you also share your prediction code with me which I can use for reference?

Thanks!

On Thu, Oct 4, 2018 at 8:05 AM Sivam Mahadevan sivam.mahadevan@broadcom.com wrote:

@Imran - Thanks. Will go over the format and see if that helps our run here.

Thanks!

On Thu, Oct 4, 2018 at 3:50 AM Imran Paruk notifications@github.com wrote:

@sivagururaman https://github.com/sivagururaman no, there is a text file provided with the git repo which has a particular format to identify speakers.

0 file1.wav 1 file2.wav 1 file3.wav 2 file4.wav

etc

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/astorfi/3D-convolutional-speaker-recognition/issues/38#issuecomment-426823027, or mute the thread https://github.com/notifications/unsubscribe-auth/AmZiiDhlEm-9t8P-sEYX8i4UsGs23HFpks5uhTgggaJpZM4Vn8OC .

MSAlghamdi commented 6 years ago

@ArvinSiChuan & @astorfi Could you please take a look to issue #47 ?

MSAlghamdi commented 6 years ago

@imranparuk , thank you for sharing you code.

I tried it and it gave the following (please note that I used h5dump -d /utterance_enrollment da_dataset.h5 &> da_dataset.log to read inside the .h5 file and i copied only the tail of the log file):


   (8,79,26,0): -12.7884,
   (8,79,27,0): -11.2512,
   (8,79,28,0): -12.5186,
   (8,79,29,0): -12.0625,
   (8,79,30,0): -12.3308,
   (8,79,31,0): -12.3839,
   (8,79,32,0): -12.5028,
   (8,79,33,0): -12.5818,
   (8,79,34,0): -12.0501,
   (8,79,35,0): -12.4087,
   (8,79,36,0): -13.032,
   (8,79,37,0): -13.4173,
   (8,79,38,0): -12.3723,
   (8,79,39,0): -11.7837
   }
   ATTRIBUTE "CLASS" {
      DATATYPE  H5T_STRING {
         STRSIZE 6;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR

I think this won't work. If we considered the first shape dimension (8) is the idx of speakers in my file_path.txt (there are 9 wav files), then we took just one utterance for each of them (last one = 0 and must be the utterance idx). That will give an error about the # of the utterances when the demo is run.

imranparuk commented 6 years ago

@MSAlghamdi Hey man, I actually stopped doing the static dataset method (Where you extract the features beforehand). I am taking a more dynamic approach to this (the features are extracted batch by batch)

I have started a project based off this project but written in Keras. You can check it out here: Keras-Speaker-Recognition

PS: @astorfi I will make sure you are given credit for your work. I just created this project, haven't had time to complete the README.md

MSAlghamdi commented 5 years ago

@imranparuk Good work! Thank you for sharing it.

I still have hope to do it in simpler static way. I tried anther method that has some issues. If you can help, it could take both advantages of you static method and the flexibility of @Chegde8 in issue #41 . The advantage in yours is the ability to arrange the feature arrays and the order of the axes so the structure of the .h5 file will be suitable for the code.
I combined both codes to get the following:


   datasetTest = AudioDataset(files_path='file_path_test.txt', audio_dir='Audio',
                           transform=Compose([CMVN(), Feature_Cube(cube_shape=(20, 80, 40), augmentation=True), ToOutput()]))

    datasetTrain = AudioDataset(files_path='file_path_train.txt', audio_dir='Audio',
                           transform=Compose([CMVN(), Feature_Cube(cube_shape=(20, 80, 40), augmentation=True), ToOutput()]))

###############    TEST DATASET       ####################
    idx_test = 0
    f1 = open('file_path_train.txt','r')
    for line in f1:
        idx_test = idx_test + 1

    lab_test = []
    feat_test = []
    for i in range(idx_test):
        feature, label = datasetTest.__getitem__(i)

        lab_test.append(label)

#   feature.shap= (1, 20, 80, 40).
#   make it like: (1, 80, 40, 20)
        feature = feature.swapaxes(1, 2).swapaxes(2, 3)
        feat_test.append(feature[0,:,:,:])

###############    TRAIN DATASET       ####################
    idx = 0
    f = open('file_path_train.txt','r')
    for line in f:
        idx = idx + 1
    lab_train = []
    feat_train = []
    for i in range(idx):
        feature, label = datasetTrain.__getitem__(i)

        lab_train.append(label)

        feature = feature.swapaxes(1, 2).swapaxes(2, 3)
        feat_train.append(feature[0,:,:,:])

    h5file = tables.open_file('/root/3D_CNN/3D-convolutional-speaker-recognition/data/devel_try.hdf5', 'w')

    label_test = h5file.create_carray(where = '/', name = 'label_test', obj = lab_test, byteorder = 'little')
    label_array = h5file.create_carray(where = '/', name = 'label_train', obj = lab_train, byteorder = 'little')

    utterance_test = h5file.create_earray(where = '/', name = 'utterance_test', chunkshape = [1,80,40,20], obj = feat_test, byteorder = 'little')
    utterance_train = h5file.create_earray(where = '/', name = 'utterance_train', chunkshape = [1,80,40,20], obj = feat_train, byteorder = 'little')
    h5file.close()

The .h5 file was created in a good shape. There's another issue popped up when I ran the demo. The posted h5 file with the project is as same structure as mine but mine has negative #'s. I think this's an issue with generating the features in the input file.py

astorfi commented 5 years ago

@MSAlghamdi Hey man, I actually stopped doing the static dataset method (Where you extract the features beforehand). I am taking a more dynamic approach to this (the features are extracted batch by batch)

I have started a project based off this project but written in Keras. You can check it out here: Keras-Speaker-Recognition

PS: @astorfi I will make sure you are given credit for your work. I just created this project, haven't had time to complete the README.md

Thank you so much for your effort and great work.

MSAlghamdi commented 5 years ago

Thank you @astorfi for your kindness and your grate project. I will be more appreciative if you tell us how did you created your .hdf5 in your work.

My master's thesis is an evaluation of yours with other SV systems. It seems your has the ability to beat them. I'm just stuck with only yours because of the h5 file issues.

astorfi commented 5 years ago

Thank you @astorfi for your kindness and your grate project. I will be more appreciative if you tell us how did you created your .hdf5 in your work.

My master's thesis is an evaluation of yours with other SV systems. It seems your has the ability to beat them. I'm just stuck with only yours because of the h5 file issues.

Thanks for your kind words. Please consider the following directions:

Bests