Open fuzeller opened 6 years ago
(From @felipeespic)
Hi @fuzeller,
You are welcome, and thank you for trying to train Merlin with MagPhase lossless version (I have never tried yet).
I think some corrupted feature is producing the error. Actually, there is a mistake in the code you provided:
v_voi = (np.exp(v_f0) > 5.0).astype(int) # 5.0: tolerance (just in case) I think that np.exp(v_f0) is producing wrong values.
So, my suggestion is:
Change the line:
m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file) by:
m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file) m_mag_log = np.log(m_mag) v_lf0 = np.log(v_f0) And then only use the parameters m_mag_log, m_real, m_imag, and v_lf0 for Merlin.
Also, I wouldn't use zeros for phase features, so I would remove the block code in # Zeros for unvoiced segments in phase features:, at least for the first trial.
Let me know how it goes.
Thanks.
PS: For issues, questions, etc, please use the "Issues" section in the Merlin repo (https://github.com/CSTR-Edinburgh/merlin/issues), or in the Magphase repo (https://github.com/CSTR-Edinburgh/magphase/issues) if it is more specific to MagPhase. In this way, we can keep track of this matter. Actually, could I copy your message to the "Issues" section, please?
Hi @felipeespic (moved to issues)
I've tested the changes you suggested and am getting some 'divide by zero' errors:
extract_features_for_merlin_lossless_test_01.py:71: RuntimeWarning: divide by zero encountered in log
v_lf0 = np.log(v_f0)
I switched to the slt data provided in the magphase demo as a common data reference and am seeing the same issue. Is this simply caused by the unvoiced portions of the signal? I added global zero ignore (yuck!) for now, "np.seterr(divide='ignore')".
HI @felipeespic - here's what I've been working with. I'm still getting an error during acoustic training that the specified dimension 2049 is not compatible with the data, so I'm obviously missing something.
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
@author: Felipe Espic
DESCRIPTION:
This script extracts high resolution acoustic parameters from a wave file.
Then, it resynthesises the signal from these features.
Features:
- m_mag: Magnitude Spectrum (dim=fft_len/2+1, usually 2049)
- m_real: Normalised real "R" (dim=fft_len/2+1, usually 2049)
- m_imag: Normalised imag "I" (dim=fft_len/2+1, usually 2049)
- v_f0: F0 (dim=1)
INSTRUCTIONS:
This demo should work out of the box. Just run it by typing: python <script name>
If wanted, you can modify the input options and/or perform some modification to the
extracted features before re-synthesis. See the main function below for details.
NOTES: This script was previously named demo_copy_synthesis_hi_res.py
"""
import sys, os
import numpy as np
import warnings
if len(sys.argv)!=5:
print("Usage: ")
print("python extract_features_for_merlin.py <path_to_merlin_dir> <path_to_wav_dir> <path_to_feat_dir> <sampling rate>")
sys.exit(1)
# top merlin directory
merlin_dir = sys.argv[1]
# input audio directory
wav_dir = sys.argv[2]
# Output features directory
out_dir = sys.argv[3]
# Expected sample rate
fs_expected = int(sys.argv[4])
# Magphase directory
magphase = os.path.join(merlin_dir, 'tools', 'magphase', 'src')
sys.path.append(os.path.realpath(magphase))
import libutils as lu
import libaudio as la
import magphase as mp
# fuz temp global divide by zero ignore - hack.
#np.seterr(divide='ignore')
def feat_extraction(wav_file, out_feats_dir):
# Parsing path:
file_name_token = os.path.basename(os.path.splitext(wav_file)[0])
# Display:
print("Analysing file: " + file_name_token + '.wav' + '................................')
# Files setup:
est_file = os.path.join(out_feats_dir, file_name_token + '.est')
# Epochs detection:
la.reaper(wav_file, est_file)
# Feature extraction:
# m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file)
m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file)
m_mag_log = np.log(m_mag)
v_lf0 = np.log(v_f0)
if fs!=fs_expected:
print("The wavefile's sample rate (%dHz) does not match the expected sample rate (%dHz)." % (fs, fs_expected))
sys.exit(1)
# Zeros for unvoiced segments in phase features:
#v_voi = (np.exp(v_f0) > 5.0).astype(int) # 5.0: tolerance (just in case)
# m_real_zeros = m_real * v_voi[:,None]
# m_imag_zeros = m_imag * v_voi[:,None]
# Saving features:
lu.write_binfile(m_mag_log, out_feats_dir + '/' + file_name_token + '.mag')
lu.write_binfile(m_real, out_feats_dir + '/' + file_name_token + '.real')
lu.write_binfile(m_imag, out_feats_dir + '/' + file_name_token + '.imag')
lu.write_binfile(v_lf0, out_feats_dir + '/' + file_name_token + '.lf0')
# Saving auxiliary feature shift (hop length). It is useful for posterior modifications of labels in Merlin.
lu.write_binfile(v_shift, out_feats_dir + '/' + file_name_token + '.shift')
return
def get_wav_filelist(wav_dir):
wav_files = []
for file in os.listdir(wav_dir):
whole_filepath = os.path.join(wav_dir, file)
if os.path.isfile(whole_filepath) and str(whole_filepath).endswith(".wav"):
wav_files.append(whole_filepath)
elif os.path.isdir(whole_filepath):
wav_files += get_wav_filelist(whole_filepath)
wav_files.sort()
return wav_files
# FILES SETUP:========================================================================
lu.mkdir(out_dir)
l_wavfiles = get_wav_filelist(wav_dir)
# MULTIPROCESSING EXTRACTION:==========================================================
lu.run_multithreaded(feat_extraction, l_wavfiles, out_dir)
# For debugging (don't delete):
#for wavfile in l_wavfiles:
# feat_extraction(wavfile, out_dir)
print('Done!')
I've tested the changes you suggested and am getting some 'divide by zero' errors:
extract_features_for_merlin_lossless_test_01.py:71: RuntimeWarning: divide by zero encountered in log v_lf0 = np.log(v_f0)
I see the error, use v_lf0 = la.log(v_f0)
and m_mag_log = la.log(m_mag)
instead, that's a protected log function, which will not raise 'divide by zero' errors.
Also, try with just a few utterances for your pilot experiment. For example, I usually use 12 training sentences.
Let me know how it goes.
What are the right dimensions for the different acoustic features? And do they stay the same when you use 48kHz data?
Hi @dreamk73 ,
The dimensions for the lossless features should be: mag = fft_len/2+1 real = fft_len/2+1 imag = fft_len/2+1 f0 = 1
(and 3X for their respective deltas.)
Where fft_len takes a value according to the sample rate (see these values in the function magphase.define_fft_len(fs)
). Also, you can define a hardcoded value for fft_len as a parameter of the function magphase.analysis_lossless(...)
.
Does the lossless version give better results than the regular magphase version? What is the most up-to-date approach to extract magphase features and train merlin models?
@fuzeller Did you resolve your problems: 2018-04-02 09:10:50,554 INFO main.train_DNN: overall training time: 0.30m validation error 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
I got same issue when I train merlin with lossless features of magphase. Here is my code for extract features: `def full_feature_extraction(wav_file, out_feats_dir):
file_name_token = os.path.basename(os.path.splitext(wav_file)[0])
# Display:
print("Analysing file: " + file_name_token + '.wav' + '................................')
# Files setup:
est_file = os.path.join(out_feats_dir, file_name_token + '.est')
# Epochs detection:
la.reaper(wav_file, est_file)
#Full feature extraction
m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file)
v_lf0 = la.f0_to_lf0(v_f0)
lu.write_binfile(m_mag, out_feats_dir + '/' + file_name_token + '.mag')
lu.write_binfile(m_real, out_feats_dir + '/' + file_name_token + '.real')
lu.write_binfile(m_imag, out_feats_dir + '/' + file_name_token + '.imag')
lu.write_binfile(v_lf0, out_feats_dir + '/' + file_name_token + '.lf0')
# Saving auxiliary feature shift (hop length). It is useful for posterior modifications of labels in Merlin.
lu.write_binfile(v_shift, out_feats_dir + '/' + file_name_token + '.shift')`
If you resolved this issue, please tell me how to do it.
@felipeespic @fuzeller
(Copied from original misplaced post by @fuzeller :-) ) @felipeespic @ronanki, Hello! I've started experimenting with higher dimensions, to include lossless. I've tried increasing dimensions in the acoustic conf file, but I'm clearly not approaching this correctly.
Test data is a set of 200 utterances (48000Hz), which run successfully through my initial magphase version of "build your own voice", with dimension values used in the slt demo.
As suggested in magphase's "demo_copy"synthesis_lossles.py", I applied these parameters:
This script extracts high resolution acoustic parameters from a wave file. Then, it resynthesises the signal from these features. Features:
as: mag: 2049 dmag: 6147 real: 2049 dreal: 6147 imag: 2049 dimag: 6147 lf0: 1
Anything other than the default values (60/180, 45/135) produces: AssertionError: specified dimension 2049 not compatible with data.
So I modified @felipeespic 's "extract_features_for_merlin" to gather the lossless acoustic features (I think), and call it, instead of the compressed features version: