CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 441 forks source link

Magphase Lossless Testing in Merlin #329

Open fuzeller opened 6 years ago

fuzeller commented 6 years ago

(Copied from original misplaced post by @fuzeller :-) ) @felipeespic @ronanki, Hello! I've started experimenting with higher dimensions, to include lossless. I've tried increasing dimensions in the acoustic conf file, but I'm clearly not approaching this correctly.

Test data is a set of 200 utterances (48000Hz), which run successfully through my initial magphase version of "build your own voice", with dimension values used in the slt demo.

As suggested in magphase's "demo_copy"synthesis_lossles.py", I applied these parameters:

This script extracts high resolution acoustic parameters from a wave file. Then, it resynthesises the signal from these features. Features:

as: mag: 2049 dmag: 6147 real: 2049 dreal: 6147 imag: 2049 dimag: 6147 lf0: 1

Anything other than the default values (60/180, 45/135) produces: AssertionError: specified dimension 2049 not compatible with data.

So I modified @felipeespic 's "extract_features_for_merlin" to gather the lossless acoustic features (I think), and call it, instead of the compressed features version:


# -*- coding: utf-8 -*-
"""
@author: Felipe Espic

DESCRIPTION:
This script extracts high resolution acoustic parameters from a wave file.
Then, it resynthesises the signal from these features.
Features:
- m_mag:  Magnitude Spectrum  (dim=fft_len/2+1, usually 2049)
- m_real: Normalised real "R" (dim=fft_len/2+1, usually 2049)
- m_imag: Normalised imag "I" (dim=fft_len/2+1, usually 2049)
- v_f0:   F0 (dim=1)

INSTRUCTIONS:
This demo should work out of the box. Just run it by typing: python <script name>
If wanted, you can modify the input options and/or perform some modification to the
extracted features before re-synthesis. See the main function below for details.

NOTES: This script was previously named demo_copy_synthesis_hi_res.py
"""

import sys, os
import numpy as np

if len(sys.argv)!=5:
    print("Usage: ")
    print("python extract_features_for_merlin.py <path_to_merlin_dir> <path_to_wav_dir> <path_to_feat_dir> <sampling rate>")
    sys.exit(1)

# top merlin directory
merlin_dir = sys.argv[1]

# input audio directory
wav_dir = sys.argv[2]

# Output features directory
out_dir = sys.argv[3]

# Expected sample rate
fs_expected = int(sys.argv[4])

# Magphase directory
magphase = os.path.join(merlin_dir, 'tools', 'magphase', 'src')
sys.path.append(os.path.realpath(magphase))
import libutils as lu
import libaudio as la
import magphase as mp

def feat_extraction(wav_file, out_feats_dir):

    # Parsing path:
    file_name_token = os.path.basename(os.path.splitext(wav_file)[0])

    # Display:
    print("Analysing file: " + file_name_token + '.wav' + '................................')

    # Files setup:
    est_file = os.path.join(out_feats_dir, file_name_token + '.est')

    # Epochs detection:
    la.reaper(wav_file, est_file)

    # Feature extraction:    

    m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file)

    if fs!=fs_expected:
        print("The wavefile's sample rate (%dHz) does not match the expected sample rate (%dHz)." % (fs, fs_expected))
        sys.exit(1)

    # Zeros for unvoiced segments in phase features:
    v_voi = (np.exp(v_f0) > 5.0).astype(int) # 5.0: tolerance (just in case)
    m_real_zeros = m_real * v_voi[:,None]
    m_imag_zeros = m_imag * v_voi[:,None]

    # Saving features:
    lu.write_binfile(m_mag,    out_feats_dir + '/' + file_name_token + '.mag')
    lu.write_binfile(m_real_zeros, out_feats_dir + '/' + file_name_token + '.real')
    lu.write_binfile(m_imag_zeros, out_feats_dir + '/' + file_name_token + '.imag')
    lu.write_binfile(v_f0,            out_feats_dir + '/' + file_name_token + '.lf0')

    # Saving auxiliary feature shift (hop length). It is useful for posterior modifications of labels in Merlin.
    lu.write_binfile(v_shift, out_feats_dir + '/' + file_name_token + '.shift')

    return

def get_wav_filelist(wav_dir):
    wav_files = []
    for file in os.listdir(wav_dir):
        whole_filepath = os.path.join(wav_dir, file)
        if os.path.isfile(whole_filepath) and str(whole_filepath).endswith(".wav"):
            wav_files.append(whole_filepath)
        elif os.path.isdir(whole_filepath):
            wav_files += get_wav_filelist(whole_filepath)

    wav_files.sort()

    return wav_files

# FILES SETUP:========================================================================
lu.mkdir(out_dir)
l_wavfiles = get_wav_filelist(wav_dir)

# MULTIPROCESSING EXTRACTION:==========================================================
lu.run_multithreaded(feat_extraction, l_wavfiles, out_dir)

# For debugging (don't delete):
#for wavfile in l_wavfiles:
#    feat_extraction(wavfile, out_dir)

print('Done!')
My results from that do get me past the dimension compatibility error. My first test gave me,
OSError: [Errno 12] Cannot allocate memory, so I adjusted buffer size down to 20000, batch size to 32, and learning rate to 0.0001 which now results in:

2018-04-02 09:10:50,554 INFO main.train_DNN: overall training time: 0.30m validation error 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000

A very odd validation error!

Do you have any advice on how I should proceed from here? Are there other dimension options I should be trying before a lossless test against a larger data set?

Thank you for your time - especially in sharing such wonderful research!
fuzeller commented 6 years ago

(From @felipeespic)

Hi @fuzeller,

You are welcome, and thank you for trying to train Merlin with MagPhase lossless version (I have never tried yet).

I think some corrupted feature is producing the error. Actually, there is a mistake in the code you provided:

v_voi = (np.exp(v_f0) > 5.0).astype(int) # 5.0: tolerance (just in case) I think that np.exp(v_f0) is producing wrong values.

So, my suggestion is:

Change the line:

m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file) by:

m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file) m_mag_log = np.log(m_mag) v_lf0 = np.log(v_f0) And then only use the parameters m_mag_log, m_real, m_imag, and v_lf0 for Merlin.

Also, I wouldn't use zeros for phase features, so I would remove the block code in # Zeros for unvoiced segments in phase features:, at least for the first trial.

Let me know how it goes.

Thanks.

PS: For issues, questions, etc, please use the "Issues" section in the Merlin repo (https://github.com/CSTR-Edinburgh/merlin/issues), or in the Magphase repo (https://github.com/CSTR-Edinburgh/magphase/issues) if it is more specific to MagPhase. In this way, we can keep track of this matter. Actually, could I copy your message to the "Issues" section, please?

fuzeller commented 6 years ago

Hi @felipeespic (moved to issues)

I've tested the changes you suggested and am getting some 'divide by zero' errors:

extract_features_for_merlin_lossless_test_01.py:71: RuntimeWarning: divide by zero encountered in log
  v_lf0 = np.log(v_f0)

I switched to the slt data provided in the magphase demo as a common data reference and am seeing the same issue. Is this simply caused by the unvoiced portions of the signal? I added global zero ignore (yuck!) for now, "np.seterr(divide='ignore')".

fuzeller commented 6 years ago

HI @felipeespic - here's what I've been working with. I'm still getting an error during acoustic training that the specified dimension 2049 is not compatible with the data, so I'm obviously missing something.

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
@author: Felipe Espic

DESCRIPTION:
This script extracts high resolution acoustic parameters from a wave file.
Then, it resynthesises the signal from these features.
Features:
- m_mag:  Magnitude Spectrum  (dim=fft_len/2+1, usually 2049)
- m_real: Normalised real "R" (dim=fft_len/2+1, usually 2049)
- m_imag: Normalised imag "I" (dim=fft_len/2+1, usually 2049)
- v_f0:   F0 (dim=1)

INSTRUCTIONS:
This demo should work out of the box. Just run it by typing: python <script name>
If wanted, you can modify the input options and/or perform some modification to the
extracted features before re-synthesis. See the main function below for details.

NOTES: This script was previously named demo_copy_synthesis_hi_res.py
"""

import sys, os
import numpy as np
import warnings

if len(sys.argv)!=5:
    print("Usage: ")
    print("python extract_features_for_merlin.py <path_to_merlin_dir> <path_to_wav_dir> <path_to_feat_dir> <sampling rate>")
    sys.exit(1)

# top merlin directory
merlin_dir = sys.argv[1]

# input audio directory
wav_dir = sys.argv[2]

# Output features directory
out_dir = sys.argv[3]

# Expected sample rate
fs_expected = int(sys.argv[4])

# Magphase directory
magphase = os.path.join(merlin_dir, 'tools', 'magphase', 'src')
sys.path.append(os.path.realpath(magphase))
import libutils as lu
import libaudio as la
import magphase as mp

# fuz temp global divide by zero ignore - hack.
#np.seterr(divide='ignore')

def feat_extraction(wav_file, out_feats_dir):

    # Parsing path:
    file_name_token = os.path.basename(os.path.splitext(wav_file)[0])

    # Display:
    print("Analysing file: " + file_name_token + '.wav' + '................................')

    # Files setup:
    est_file = os.path.join(out_feats_dir, file_name_token + '.est')

    # Epochs detection:
    la.reaper(wav_file, est_file)

    # Feature extraction:    

#    m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file)
    m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file)
    m_mag_log = np.log(m_mag)
    v_lf0 = np.log(v_f0)

    if fs!=fs_expected:
        print("The wavefile's sample rate (%dHz) does not match the expected sample rate (%dHz)." % (fs, fs_expected))
        sys.exit(1)

    # Zeros for unvoiced segments in phase features:
    #v_voi = (np.exp(v_f0) > 5.0).astype(int) # 5.0: tolerance (just in case)
#    m_real_zeros = m_real * v_voi[:,None]
#    m_imag_zeros = m_imag * v_voi[:,None]

    # Saving features:
    lu.write_binfile(m_mag_log,        out_feats_dir + '/' + file_name_token + '.mag')
    lu.write_binfile(m_real, out_feats_dir + '/' + file_name_token + '.real')
    lu.write_binfile(m_imag, out_feats_dir + '/' + file_name_token + '.imag')
    lu.write_binfile(v_lf0,         out_feats_dir + '/' + file_name_token + '.lf0')

    # Saving auxiliary feature shift (hop length). It is useful for posterior modifications of labels in Merlin.
    lu.write_binfile(v_shift, out_feats_dir + '/' + file_name_token + '.shift')

    return

def get_wav_filelist(wav_dir):
    wav_files = []
    for file in os.listdir(wav_dir):
        whole_filepath = os.path.join(wav_dir, file)
        if os.path.isfile(whole_filepath) and str(whole_filepath).endswith(".wav"):
            wav_files.append(whole_filepath)
        elif os.path.isdir(whole_filepath):
            wav_files += get_wav_filelist(whole_filepath)

    wav_files.sort()

    return wav_files

# FILES SETUP:========================================================================
lu.mkdir(out_dir)
l_wavfiles = get_wav_filelist(wav_dir)

# MULTIPROCESSING EXTRACTION:==========================================================
lu.run_multithreaded(feat_extraction, l_wavfiles, out_dir)

# For debugging (don't delete):
#for wavfile in l_wavfiles:
#    feat_extraction(wavfile, out_dir)

print('Done!')
felipeespic commented 6 years ago

I've tested the changes you suggested and am getting some 'divide by zero' errors:

extract_features_for_merlin_lossless_test_01.py:71: RuntimeWarning: divide by zero encountered in log v_lf0 = np.log(v_f0)

I see the error, use v_lf0 = la.log(v_f0) and m_mag_log = la.log(m_mag) instead, that's a protected log function, which will not raise 'divide by zero' errors.

Also, try with just a few utterances for your pilot experiment. For example, I usually use 12 training sentences.

Let me know how it goes.

dreamk73 commented 6 years ago

What are the right dimensions for the different acoustic features? And do they stay the same when you use 48kHz data?

felipeespic commented 6 years ago

Hi @dreamk73 ,

The dimensions for the lossless features should be: mag = fft_len/2+1 real = fft_len/2+1 imag = fft_len/2+1 f0 = 1

(and 3X for their respective deltas.)

Where fft_len takes a value according to the sample rate (see these values in the function magphase.define_fft_len(fs)). Also, you can define a hardcoded value for fft_len as a parameter of the function magphase.analysis_lossless(...).

dreamk73 commented 6 years ago

Does the lossless version give better results than the regular magphase version? What is the most up-to-date approach to extract magphase features and train merlin models?

chazo1994 commented 5 years ago

@fuzeller Did you resolve your problems: 2018-04-02 09:10:50,554 INFO main.train_DNN: overall training time: 0.30m validation error 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000

I got same issue when I train merlin with lossless features of magphase. Here is my code for extract features: `def full_feature_extraction(wav_file, out_feats_dir):

Parsing path:

file_name_token = os.path.basename(os.path.splitext(wav_file)[0])

# Display:
print("Analysing file: " + file_name_token + '.wav' + '................................')

# Files setup:
est_file = os.path.join(out_feats_dir, file_name_token + '.est')

# Epochs detection:
la.reaper(wav_file, est_file)

#Full feature extraction
m_mag, m_real, m_imag, v_f0, fs, v_shift = mp.analysis_lossless(wav_file)
v_lf0 = la.f0_to_lf0(v_f0)

lu.write_binfile(m_mag, out_feats_dir + '/' + file_name_token + '.mag')
lu.write_binfile(m_real, out_feats_dir + '/' + file_name_token + '.real')
lu.write_binfile(m_imag, out_feats_dir + '/' + file_name_token + '.imag')
lu.write_binfile(v_lf0, out_feats_dir + '/' + file_name_token + '.lf0')

# Saving auxiliary feature shift (hop length). It is useful for posterior modifications of labels in Merlin.
lu.write_binfile(v_shift, out_feats_dir + '/' + file_name_token + '.shift')`

If you resolved this issue, please tell me how to do it.

@felipeespic @fuzeller