How to Generate Music using a LSTM Neural Network in Keras

dev-onejun commented 1 year ago

https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5

This article is introduced on the codecrafters-io/build-your-own-x

dev-onejun commented 1 year ago

Summary

Repository

and, it seems interesting that using orchestral music

using music21 to deal with music data and from the standard of Piano ..

Pitch: 음계 (원래는 주파수와 함께 옥타브의 정보도 갖고 있는듯)
Octave: 옥타브
Offset: 악보에서 음표의 위치 및 길이 (박자로 이해하면 될 듯)
- 이 tutorial에서는 offset을 모두 0.5로 보고 무시한다는데,,

공식 문서를 참고하는 게 더 정확하겠다.

참고로 Object Note에는 Object Duration도 있는데, 이것이 음표(또는 쉼표)의 길이를 나타낸다.
- whole, half, quarter, ... 등으로 초기화 가능하지만, float type의 숫자로도 가능한데 1은 quarter를 나타낸다.
  - ex. duration.Duration(1.5) => 점4분음표
Pitch: 음계. ex) G3 is the lowest frequency of the pitch that the violin can sound
Chord: 화음. the list of the pitches
Note: 음표.
- 다시 말해, 옥타브, 길이(박자) 등의 정보를 다 가지고 있어야 함.

dev-onejun commented 1 year ago

Preparing the Data

from music21 import converter, instrument, note, chord

notes = []

for file in glob.glob("midi_songs/*.mid"): # glob()를 통해 * (wildcard)로 파일을 읽어들인다.
    midi = converter.parse(file)
    parts = instrument.partitionByInstrument(midi) # 1.

    notes_to_parse = None
    if parts: # file has instrument parts
        notes_to_parse = parts.parts[0].recurse()
    else: # file has notes in a flat structure
        notes_to_parse = midi.flat.notes

    for element in notes_to_parse:
        if isinstance(element, note.Note):
            notes.append(str(element.pitch))
        elif isinstance(element, chord.Chord):
            notes.append('.'.join(str(n) for n in element.normalOrder))

instrument.partitionByInstrument()
- I wonder that 0 index of every partitions is piano ..?
- Plus, if it is a flat structure then the instrument is piano always?
- The result of running print(parts.show('text')) solved my question. It seems that the author selects the music only using the piano or which the 0 index of music partition is piano or preprocess the data before.

dev-onejun commented 1 year ago

Preprocessing the data

sequence_length = 100

# get all pitch names
pitchnames = sorted(set(item for item in notes)) # 중복 제거 후, 정렬된 리스트로 변환 -> 0.

# create a dictionary to map pitches to integers -> 1.
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

# create input sequences and the corresponding outputs
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
    sequence_in = notes[i: i + sequence_length] # ex. 0~99까지 (100개) 음표를 저장
    sequence_out = notes[i + sequence_length] # ex. 100의 음표를 저장

    network_input.append([note_to_int[char] for char in sequence_in]) # 위에서 저장한 음표를 string -> int로 변환
    network_output.append(note_to_int[sequence_out]) # 위와 동일

# reshape the input into a format compatible with LSTM layers -> 2.
n_patterns = len(network_input)
network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))

# normalize input
n_vocab = len(note_to_int)
network_input = network_input / float(n_vocab)
network_output = np_utils.to_categorical(network_output) # 3.

그런데 pitchnames 출력해보면, A1 등 pitches뿐만 아니라 0.1.5와 같은 숫자 베이스도 있음. 화음이라는 생각이 드는데, dict()를 통해 music21의 인코딩 방식과 관련해 조금 더 찾아봐야 할 듯
- 아래에 decode 하는 부분을 참고해보면, 데이터를 모을 때 화음의 인코딩을 숫자로 한 것으로 보임.
본문 중

First, we will create a mapping function to map from string-based categorical data to integer-based numerical data. This is done because neural network perform much better with integer-based numerical data than string-based categorical data.

enumerate()을 통해, 위에서 정렬한 순서(인덱스) 그대로 매핑하고 있음. 다시 말해 정말로 의미있는 데이터 값을 지니진 않는 것으로 보임.

network_* 변수에 append를 통해 1차원 배열 형태로 값을 저장하기도 했고, numpy 형태로 변환하기 위해 사용한 것으로 보임. 변환한 형태는 위의 for문을 통해 자른 것과 동일한 효과를 지님. (이럴거면 for문 내에서 잘라도 되지 않았나 하는 생각도 든다 ..) 하지만 2차원 형태가 아닌 3차원 형태로 재배열했기에 이 또한 좋아보인다.
keras.utils.np_utils.to_categorical() official docs multiclass-classification에서 어떤 label인지 표시하는 것처럼 나옴

>>> np_utils.to_categorical([0,1])
array([[1., 0.],
       [0., 1.]], dtype=float32)

소결론

이러한 형태로 전처리한 데이터를 One-Hot Encode Data라고 한다. 그런데 위에서 데이터를 준비할 때 모든 곡들을 한 번에 이어붙였는데 괜찮나? 아니면 글의 마지막에 있는 예시곡을 들었을 때 곡의 분위기가 한 곡에서도 여러번 바뀌었던 것으로 보아 이것의 영향이 있었던 것 같다.

저자 개선 제안으로 소개됨. 곡마다 구분하도록 개선 방안을 제안함.

dev-onejun commented 1 year ago

Building a model

model = Sequential()

# Input Layer
model.add(LSTM(
    256,
    input_shape=(network_input.shape[1], network_input.shape[2]), # 몇개의 행렬로 나누었는지 불러옴. (sequence_length, 1)
    return_sequences=True
))
model.add(Dropout(0.3))

# Hidden Layers
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.3))

model.add(LSTM(256))
model.add(Dense(256))
model.add(Dropout(0.3))

# Output Layer
model.add(Dense(n_vocab))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

dev-onejun commented 1 year ago

Fitting the model

filepath = "weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"

# 1.
checkpoint = ModelCheckpoint(
    filepath, monitor='loss', 
    verbose=0,        
    save_best_only=True,        
    mode='min'
)
callbacks_list = [checkpoint]

model.fit(network_input, network_output, epochs=200, batch_size=64, callbacks=callbacks_list)

ModelCheckpoint를 위와 같이 사용해서 200번의 propagation 중 가장 마음에 드는(loss가 작은) weight의 모델을 저장할 수 있다.

dev-onejun commented 1 year ago

Generating music with the model

# make random seed
start = numpy.random.randint(0, len(network_input)-1)
pattern = network_input[start] # same as the variable sequence_length

int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

# generate 500 notes
prediction_output = []
for note_index in range(500):
    # 0.
    prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
    prediction_input = prediction_input / float(n_vocab)
    prediction = model.predict(prediction_input, verbose=0)

    # 1.
    index = numpy.argmax(prediction)
    result = int_to_note[index]
    prediction_output.append(result)

    # 2.
    pattern.append(index)
    pattern = pattern[1:len(pattern)]

LSTM model requires the sequential data to predict
to determine the highest possibility pitch, numpy.argmax() is used
discard the first note and append the predicted note to preserve the variable pattern having 100 (sequence_length) notes only.

dev-onejun commented 1 year ago

Decode to music

offset = 0
music = []

# create note and chord objects based on the values generated by the model
for pitches in prediction_output:
    # pitches is a chord
    if ('.' in pitches) or pitches.isdigit():
        notes_in_chord = pitches.split('.')

        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)

        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        music.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pitches)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        music.append(new_note)

    offset += 0.5

Convert to midi file

midi_stream = stream.Stream(music)
midi_stream.write('midi', fp='test_output.mid')

dev-onejun commented 1 year ago

Author Suggested

various length of notes (or rests) with more classes and deeper LSTM network.
make distinct the start and end of the music (classify the each musics)
about unknown notes.
adding more instruments.

My thought

what about making the length of sequence as bar in music sheet (not 100).

dev-onejun commented 1 year ago

The generated midi file just plays the same chords continuously. In my opinion, iteration in generate_music() which generates the variable prediction_output should fix somewhere. But, for now, I won't fix it ...

dev-onejun commented 1 year ago

[ ] maybe it is related with model.load_weights()

skrinsky commented 6 months ago

did you ever fix the problem with prediction_output? I have not been able to solve this

dev-onejun / AI-Study