cemoody / lda2vec

MIT License
3.15k stars 629 forks source link

AxisError: axis -1 is out of bounds for array of dimension 0 #62

Open rpiryani opened 6 years ago

rpiryani commented 6 years ago

I am running line by line code of twentynewsgroup/preprocess.py I am getting this error because of corpus.update_word_count(tokens) corpus.finalize() How to resolve this issue?

codybraun commented 6 years ago

This seems to be an issue moving from Python2.x to 3- dict.values() returns a view rather than a list in Python 3, though you can force it by changing this line to

specials = np.sort(list(self.specials.values()))

h-vishal3 commented 5 years ago

thanks for providing code. I want to use this PNCC code to create GMM (gaussion mixture model) but it gives me "AxisError: axis -1 is out of bounds for array of dimension 0" this error. when i am calling this feature at gmm model.

gmm model

import _pickle import numpy as np import soundfile as sf from sklearn.mixture import GaussianMixture from PNCC import pncc_feature

import warnings warnings.filterwarnings("ignore")

source = ("mobi_data/")
dest = ("Device_models_zcr/") train_file = "trainData_mobiphone.txt"
file_paths = open(train_file,'r') count = 1

Extracting features for each speaker (5 files per speakers)

features = np.asarray(()) for path in file_paths:
path = path.strip()
print (path)

# read the audio
sr,audio = sf.read(source + path)

# extract 40 dimensional MFCC & delta MFCC features
#vector   = pncc(audio,sr)
vector   = pncc_feature(audio,sr) <------error is at this place---->

if features.size == 0:
    features = vector
else:
    features = np.vstack((features, vector))
# when features of 5 files of speaker are concatenated, then do model training
# -> if count == 5: --> edited below
if count == 15:    
    gmm = GaussianMixture(n_components = 16, n_iter = 200, covariance_type='diag',n_init = 3)
    gmm.fit(features)

    # dumping the trained gaussian model
    picklefile = path.split("-")[0]+".gmm"
    _pickle.dump(gmm,open(dest + picklefile,'w'))
    print ('+ modeling completed for speaker:',picklefile," with data point = ",features.shape)    
    features = np.asarray(())
    count = 0
count = count + 1