BShakhovsky / PolyphonicPianoTranscription

Recurrent Neural Network for generating piano MIDI-files from audio (MP3, WAV, etc.)
https://magenta.tensorflow.org/onsets-frames
230 stars 41 forks source link

AssertionError: Wrong mels decibels range #2

Closed htluandc2 closed 2 years ago

htluandc2 commented 4 years ago

I'm running your code in '1 Datasets Preparation.ipynb'. This is my issue when I tested on Maestro dataset. Maybe you can help me. Thank you very much.

I want to test script on a file. This is my folder:

subset/2015/MIDI-Unprocessed_R1_D1-1-8_mid--AUDIO-from_mp3_01_R1_2015_wav--1.wav
subset/2015/MIDI-Unprocessed_R1_D1-1-8_mid--AUDIO-from_mp3_01_R1_2015_wav--1.midi

This is code block (very long):

rate, minSecs, maxSecs, melsMinMin, melsMinMax, melsMeanMin, melsMeanMax, melsMaxMin, melsMaxMax \
    = 16_000, 1, 5, -40, -40, -0, -0, 40, 40
nFrames = lbr.time_to_frames(maxSecs, rate) + 1

for yearFolder in glob(dataFolder + '/*/'):
    if yearFolder.split('/')[1] in ['train', 'test', 'validation']: continue
    print(yearFolder, end='\n\n')
    for i, song in enumerate(glob(yearFolder + '/*.wav')):
        year, songFile = song.split('/')[1:]
        csvRow = df.loc[df['audio_filename'] == '/'.join([year, songFile])]
        assert csvRow['year'].to_list()[0] == int(year), 'CSV year is incorrect'
        split = csvRow['split'].to_list()
        assert len(split) == 1, 'CSV train/test split is incorrect'
        split = split[0]
        if not isdir('{}/{}/{}'.format(dataFolder, split, year)): makedirs('{}/{}/{}'.format(dataFolder, split, year))
        print('{} of {}\t{}\t{}'.format(i + 1, len(glob(yearFolder + '\*.wav')), split, song))

        if any(list(map(lambda name: NotExists('{}/{}/{}/{}'.format(dataFolder, split, year, songFile[:-4]), name),
                        ['Mels', 'Onsets', 'Offsets', 'Actives', 'Volumes']))):
            ######################################################################################################
            # From https://github.com/tensorflow/magenta/blob/master/magenta/music/audio_io.py

            nativeRate, y = readWave(song)
            if y.dtype == np.int16: y = int16_samples_to_float32(y)
            elif y.dtype != np.float32: raise AudioIOError('WAV file not 16-bit or 32-bit float PCM, unsupported')

            if y.ndim == 2 and y.shape[1] == 2: y = lbr.to_mono(y.T)
            if nativeRate != rate: y = lbr.resample(y, nativeRate, rate)

            ######################################################################################################
            # From https://github.com/tensorflow/magenta/blob/master/magenta/models/onsets_frames_transcription/split_audio_and_label_data.py
            # def process_record(..., min_length=5, max_length=20, sample_rate=16000,
            #     allow_empty_notesequence=False, load_audio_with_librosa=False)

            samples = lbr.util.normalize(y)
            sequence = apply_sustain_control_changes(midi_file_to_note_sequence(song[:-3] + 'midi'))
            roll = sequence_to_pianoroll(sequence, 1 / lbr.frames_to_time(1, rate), 21, 108,
                                         onset_length_ms=32, offset_length_ms=32, onset_mode='length_ms')
            splits = [0, sequence.total_time] if split == 'test' else \
                find_split_points(sequence, samples, rate, minSecs, maxSecs)

            mels, onsets, offsets, actives, volumes = [], [], [], [], []
            for i, (start, end) in enumerate(zip(splits[:-1], splits[1:])):
                print('\tFragment {} of {}'.format(i + 1, len(splits) - 1), end='\t')
                if end - start < minSecs:
                    if i not in [0, len(splits) - 2]: print('WARNING: ', end='')
                    print('Skipping short sequence < {} seconds'.format(minSecs))
                    continue

                # Resampling in crop_wav_data is really slow, and we have already done it once, avoid doing it twice:
                newMels = lbr.power_to_db(lbr.feature.melspectrogram(samples if start == 0
                        and end == sequence.total_time else crop_samples(samples, rate, start, end - start),
                    rate, n_mels=229, fmin=30, htk=True).astype(np.float32).T).astype(np.float16)
                newOnsets, newOffsets, newActives, newVolumes = map(lambda arr:arr[
                        lbr.time_to_frames(start + lbr.frames_to_time(1, rate) / 2, rate) :
                        lbr.time_to_frames(  end + lbr.frames_to_time(1, rate) / 2, rate) + 1],
                    [roll.onsets, roll.offsets, roll.active, roll.onset_velocities])
                if split != 'test':
                    if len(newOnsets) == len(newMels) + 1: newOnsets, newOffsets, newActives, newVolumes \
                        = newOnsets[:-1], newOffsets[:-1], newActives[:-1], newVolumes[:-1]
                    elif len(newMels) == len(newOnsets) + 1: newMels = newMels[:-1]
                elif len(newOnsets) < len(newMels): newMels = newMels[:len(newOnsets)]
                assert split == 'test' or len(newOnsets) == len(newMels), \
                    'Spectrogram duration is different from piano rolls durations'

                if not newOnsets.sum():
                    if i not in [0, len(splits) - 2]: print('WARNING: ', end='')
                    print('Skipping empty sequence')
                    continue
                try: assert melsMinMin < newMels.min() < melsMinMax and melsMeanMin < newMels.mean() < melsMeanMax \
                    and melsMaxMin < newMels.max() < melsMaxMax, 'Wrong mels decibels range'
                except:
                    if i == len(splits) - 2 and newMels.min() == newMels.mean() == newMels.max() == -100:
                        print('WARNING: Skipping strange sequence with all mels = -100 Db')
                        continue
                    else:
                        print(newMels.min(), newMels.mean(), newMels.max())
                        raise

                ########################################################################################################
                # Unfortunately, magenta.music.sequences_lib.extract_subsequence does not take the correct time interval
                # So, we have to manually remove notes which started before the interval:
                for note in newActives[0].nonzero()[0]:
                    for i, act in enumerate(newActives):
                        if newOnsets[i][note] or not act[note]: break
                        newActives[i][note] = 0
                ########################################################################################################

                if split != 'test': newMels, newOnsets, newOffsets, newActives, newVolumes = map(
                    lambda arr: np.pad(arr, [(0, nFrames - len(arr)), (0, 0)], 'minimum' if arr is newMels \
                    else 'constant'), [newMels, newOnsets, newOffsets, newActives, newVolumes])
                assert newMels.shape[:-1] == newOnsets.shape[:-1] == newOffsets.shape[:-1]        \
                        == newActives.shape[:-1] == newVolumes.shape[:-1]                          \
                    and newOnsets.shape == newOffsets.shape == newActives.shape == newVolumes.shape \
                    and newOnsets.shape[1] == 88 and newMels.shape[1] == 229, 'Wrong data shape'
                mels, onsets, offsets, actives, volumes = map(lambda arr, newArr: arr + [newArr],
                    [mels, onsets, offsets, actives, volumes], [newMels, newOnsets, newOffsets, newActives, newVolumes])
                print()

            for name, arr in zip(['Mels', 'Onsets', 'Offsets', 'Actives', 'Volumes'],
                                 [mels, onsets, offsets, actives, volumes]):
                np.save('{}/{}/{}/{} {}'.format(dataFolder, split, year, songFile[:-4], name), arr)
    print()
'All piano pieces have been processed'
subset/2015/

1 of 0  test    subset/2015/MIDI-Unprocessed_R1_D1-1-8_mid--AUDIO-from_mp3_01_R1_2015_wav--2.wav
    Fragment 1 of 1 -47.22 -29.27 32.78
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-12-74c09e59b22d> in <module>
     69                     continue
     70                 try: assert melsMinMin < newMels.min() < melsMinMax and melsMeanMin < newMels.mean() < melsMeanMax \
---> 71                     and melsMaxMin < newMels.max() < melsMaxMax, 'Wrong mels decibels range'
     72                 except:
     73                     if i == len(splits) - 2 and newMels.min() == newMels.mean() == newMels.max() == -100:

AssertionError: Wrong mels decibels range
BShakhovsky commented 4 years ago

Hello,

Yes, this is my mistake, I did not write correct values for melsMinMin, melsMinMax etc. And now I do not remember them. But this check is just for debugging purpose, I used it during writing the code to ensure that mels values are reasonable and that I did not make some dumb mistake.

This check is not really required, and you can simply comment out "else branch" in the following block:

try: assert melsMinMin < newMels.min() < melsMinMax and melsMeanMin < newMels.mean() < melsMeanMax \
    and melsMaxMin < newMels.max() < melsMaxMax, 'Wrong mels decibels range'
except:
    if i == len(splits) - 2 and newMels.min() == newMels.mean() == newMels.max() == -100:
        print('WARNING: Skipping strange sequence with all mels = -100 Db')
        continue
#   else:
#       print(newMels.min(), newMels.mean(), newMels.max())
#       raise
htluandc2 commented 4 years ago

Thank you very much.

tonynastor commented 3 years ago

Hi, After this section, I meet another issue: The number of numpy array is not consistent....


AssertionError Traceback (most recent call last)

in 12 newActives.split(' ')[0], '\n',newVolumes.split(' ')[0]) 13 assert song == newOnsets.split(' ')[0] == newOffsets.split(' ')[0] == newActives.split(' ')[0] \ ---> 14 == newVolumes.split(' ')[0], 'Inconsistent number of numpy arrays' 15 print('{} of {}'.format(i + 1, len(glob('{}/{}/{}/*Actives.npy'.format( 16 dataFolder, splitFolder, yearFolder)))), end='\t') AssertionError: Inconsistent number of numpy arrays
BShakhovsky commented 3 years ago

Hi,

I am not quite sure, but one possible reason of the error may be the following. In the previous section "Mels", "Onsets", "Offsets" and "Volumes" numpy arrays were created for each musical piece. Then, in your section with the error, numpy arrays are concatenated, and the resulted 4 arrays for "Mels", "Onsets", "Offsets" and "Volumes" must of-course have the same length. Maybe, for some musical piece not all 4 arrays were saved, and for example there are "Mels", "Onsets", "Offsets", but there is no "Volumes", or something like that. If that is the reason, then why it happened in your case, I don't know.

If you don't want to run the previous cell and wait again, you can look through your files with numpy arrays, find the ones for the problematic musical piece and just delete them.

You can also find the name of the problematic musical piece by looking at the printed output in the template. The song with the name in the last line should have been successfully processed, and the problematic song should be the next one (which is not printed).

tonynastor commented 3 years ago

Hi Boris,

Thank you for your help, I try to print the wrong music piece and the log as shown below. I don't know why. However, I try a workaround method: Get the song name and attach the following suffix, ' Onsets.npy', ' Offsets.npy', ' Actives.npy', ' Volumes.npy'. The workaround could work in my side.

train 2006

/project/at101-group17/datasets/maestro-v1.0.0/train/2006/MIDI-Unprocessed_24_R1_2006_01-05_ORIG_MID--AUDIO_24_R1_2006_01_Track01_wav /project/at101-group17/datasets/maestro-v1.0.0/train/2006/MIDI-Unprocessed_22_R2_2006_01_ORIG_MID--AUDIO_22_R2_2006_02_Track02_wav /project/at101-group17/datasets/maestro-v1.0.0/train/2006/MIDI-Unprocessed_22_R2_2006_01_ORIG_MID--AUDIO_22_R2_2006_02_Track02_wav /project/at101-group17/datasets/maestro-v1.0.0/train/2006/MIDI-Unprocessed_13_R1_2006_01-06_ORIG_MID--AUDIO_13_R1_2006_06_Track06_wav /project/at101-group17/datasets/maestro-v1.0.0/train/2006/MIDI-Unprocessed_22_R2_2006_01_ORIG_MID--AUDIO_22_R2_2006_02_Track02_wav

---------------------------------------------------------------------------AssertionError Traceback (most recent call last) in 17

newOnsets = song+' Onsets.npy'newOffsets = song+'

Offsets.npy'newActives = song+' Actives.npy'newVolumes = song+' Volumes.npy' 18 assert song == newOnsets.split(' ')[0] == newOffsets.split(' ')[0] == newActives.split(' ')[0] ---> 19 == newVolumes.split(' ')[0], 'Inconsistent number of numpy arrays' AssertionError: Inconsistent number of numpy arrays

Thanks a lot,

Antony

Boris Shakhovsky @.***> 於 2021年8月1日 週日 上午4:36寫道:

Hi,

I am not quite sure, but one possible reason of the error may be the following. In the previous section "Mels", "Onsets", "Offsets" and "Volumes" numpy arrays were created for each musical piece. Then, in your section with the error, numpy arrays are concatenated, and the resulted 4 arrays for "Mels", "Onsets", "Offsets" and "Volumes" must of-course have the same length. Maybe, for some musical piece not all 4 arrays were saved, and for example there are "Mels", "Onsets", "Offsets", but there is no "Volumes", or something like that. If that is the reason, then why it happened in your case, I don't know.

If you don't want to run the previous cell and wait again, you can look through your files with numpy arrays, find the ones for the problematic musical piece and just delete them.

You can also find the name of the problematic musical piece by looking at the printed output in the template. The song with the name in the last line should have been successfully processed, and the problematic song should be the next one (which is not printed).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BShakhovsky/PolyphonicPianoTranscription/issues/2#issuecomment-890401417, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADYU34Q4EJCIMVFJPW6CLELT2RNEJANCNFSM4MHDH3QA .

BShakhovsky commented 3 years ago

Hi Antony,

Wow, in your output files are not sorted by name. I have looked at the description of python glob.glob function, and it turns out that it returns files in arbitrary order. I did not know that, and in my case glob.glob output was sorted, and maybe it was just a coincidence.

Most likely, that is why my code breaks. Your workaround should work. Or, another (maybe clearer) way would be to sort the lists of filenames, i.e. to change the following line: glob('{}\{}\{}\*{}.npy'.format(dataFolder, splitFolder, yearFolder, arr)) to the following: sorted(glob('{}\{}\{}\*{}.npy'.format(dataFolder, splitFolder, yearFolder, arr)))

And there will be a similar block of code in the next cell. If assertion in the next cell also fails, sorting should help (sorted(glob(...))), i.e. change the following two lines:

for i, [newMels, newOnsets, newOffsets, newActives, newVolumes] in enumerate(zip(*(glob('{}\{}\*{}.npy'.format(
    dataFolder, splitFolder, arr)) for arr in ['Mels', 'Onsets', 'Offsets', 'Actives', 'Volumes']))):

to the following:

for i, [newMels, newOnsets, newOffsets, newActives, newVolumes] in enumerate(zip(*(sorted(glob('{}\{}\*{}.npy'.format(
    dataFolder, splitFolder, arr))) for arr in ['Mels', 'Onsets', 'Offsets', 'Actives', 'Volumes']))):