CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 441 forks source link

Merlin magphase with full feature (lossless features) #416

Open chazo1994 opened 5 years ago

chazo1994 commented 5 years ago

I try to train a acoustic model with magphase vocoder and use mp.analysis_lossless to extract full lossless features of magphase. Here is the script I used. ` file_name_token = os.path.basename(os.path.splitext(wav_file)[0])

# Display:
print("Analysing file: " + file_name_token + '.wav' + '................................')

# Files setup:
est_file = os.path.join(out_feats_dir, file_name_token + '.est')

# Epochs detection:
la.reaper(wav_file, est_file)

#Full feature extraction
m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file)
v_lf0 = la.log(v_f0)
m_mag_log = la.log(m_mag)

lu.write_binfile(m_mag_log, out_feats_dir + '/' + file_name_token + '.mag')
lu.write_binfile(m_real, out_feats_dir + '/' + file_name_token + '.real')
lu.write_binfile(m_imag, out_feats_dir + '/' + file_name_token + '.imag')
lu.write_binfile(v_lf0, out_feats_dir + '/' + file_name_token + '.lf0')

# Saving auxiliary feature shift (hop length). It is useful for posterior modifications of labels in Merlin.
lu.write_binfile(v_shift, out_feats_dir + '/' + file_name_token + '.shift')

AND the following code to synthesis from lossless features: sys.path.append(cfg.magphase_bindir) import libutils as lu import libaudio as la import magphase as mp

in_feats_dir = gen_dir
out_syn_dir = gen_dir
nfiles = len(file_id_list)
for nxf in xrange(nfiles):
    filename_token = file_id_list[nxf]
    logger.info('Creating waveform for %4d of %4d: %s' % (nxf + 1, nfiles, filename_token))

    m_mag_log = lu.read_binfile(in_feats_dir + '/' + filename_token + '.mag', dim=cfg.mag_dim)
    m_real = lu.read_binfile(in_feats_dir + '/' + filename_token + '.real', dim=cfg.real_dim)
    m_imag = lu.read_binfile(in_feats_dir + '/' + filename_token + '.imag', dim=cfg.real_dim)
    v_lf0 = lu.read_binfile(in_feats_dir + '/' + filename_token + '.lf0', dim=1)
    v_f0 = np.exp(v_lf0)
    m_mag = np.exp(m_mag_log)
    v_syn_sig = mp.synthesis_from_lossless(m_mag, m_real, m_imag, v_f0, cfg.fs)
    la.write_audio_file(out_syn_dir + '/' + filename_token + '.wav', v_syn_sig, cfg.fs)`

During train acoustic model i has following issue (it same like this issue #329 But I can't figure out how to fix it): 2018-11-29 16:49:03,237 INFO main.train_DNN: fine-tuning the DNN model 2018-11-29 16:53:48,916 INFO main.train_DNN: epoch 1, validation error nan, train error nan time spent 285.68 2018-11-29 16:53:48,916 INFO main.train_DNN: overall training time: 4.76m validation error 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000 @felipeespic @m-toman @ronanki Please help me.

chazo1994 commented 5 years ago

My acoustic configuration: [Outputs]

dX should be 3 times X

mag: 1025 dmag: 3075 real: 1025 dreal: 3075 imag: 1025 dimag: 3075 lf0: 1 dlf0: 3

vmontazeri commented 5 years ago

Did you find a solution for this error? My training error is extremely high like what you show here.