dev-onejun / AI-Study

[since 2021] Artificial Intelligence Study
0 stars 1 forks source link

How to Generate Music using a LSTM Neural Network in Keras #1

Closed dev-onejun closed 1 year ago

dev-onejun commented 1 year ago

This article is introduced on the codecrafters-io/build-your-own-x

dev-onejun commented 1 year ago

Summary

Repository


using music21 to deal with music data and from the standard of Piano ..

공식 문서를 참고하는 게 더 정확하겠다.

dev-onejun commented 1 year ago

Preparing the Data

from music21 import converter, instrument, note, chord

notes = []

for file in glob.glob("midi_songs/*.mid"): # glob()를 통해 * (wildcard)로 파일을 읽어들인다.
    midi = converter.parse(file)
    parts = instrument.partitionByInstrument(midi) # 1.

    notes_to_parse = None
    if parts: # file has instrument parts
        notes_to_parse = parts.parts[0].recurse()
    else: # file has notes in a flat structure
        notes_to_parse = midi.flat.notes

    for element in notes_to_parse:
        if isinstance(element, note.Note):
            notes.append(str(element.pitch))
        elif isinstance(element, chord.Chord):
            notes.append('.'.join(str(n) for n in element.normalOrder))
  1. instrument.partitionByInstrument()
    • I wonder that 0 index of every partitions is piano ..?
    • Plus, if it is a flat structure then the instrument is piano always?
    • The result of running print(parts.show('text')) solved my question. It seems that the author selects the music only using the piano or which the 0 index of music partition is piano or preprocess the data before.
dev-onejun commented 1 year ago

Preprocessing the data

sequence_length = 100

# get all pitch names
pitchnames = sorted(set(item for item in notes)) # 중복 제거 후, 정렬된 리스트로 변환 -> 0.

# create a dictionary to map pitches to integers -> 1.
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

# create input sequences and the corresponding outputs
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
    sequence_in = notes[i: i + sequence_length] # ex. 0~99까지 (100개) 음표를 저장
    sequence_out = notes[i + sequence_length] # ex. 100의 음표를 저장

    network_input.append([note_to_int[char] for char in sequence_in]) # 위에서 저장한 음표를 string -> int로 변환
    network_output.append(note_to_int[sequence_out]) # 위와 동일

# reshape the input into a format compatible with LSTM layers -> 2.
n_patterns = len(network_input)
network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))

# normalize input
n_vocab = len(note_to_int)
network_input = network_input / float(n_vocab)
network_output = np_utils.to_categorical(network_output) # 3.
  1. 그런데 pitchnames 출력해보면, A1 등 pitches뿐만 아니라 0.1.5와 같은 숫자 베이스도 있음. 화음이라는 생각이 드는데, dict()를 통해 music21의 인코딩 방식과 관련해 조금 더 찾아봐야 할 듯

    • 아래에 decode 하는 부분을 참고해보면, 데이터를 모을 때 화음의 인코딩을 숫자로 한 것으로 보임.
  2. 본문 중

    First, we will create a mapping function to map from string-based categorical data to integer-based numerical data. This is done because neural network perform much better with integer-based numerical data than string-based categorical data.

enumerate()을 통해, 위에서 정렬한 순서(인덱스) 그대로 매핑하고 있음. 다시 말해 정말로 의미있는 데이터 값을 지니진 않는 것으로 보임.

  1. network_* 변수에 append를 통해 1차원 배열 형태로 값을 저장하기도 했고, numpy 형태로 변환하기 위해 사용한 것으로 보임. 변환한 형태는 위의 for문을 통해 자른 것과 동일한 효과를 지님. (이럴거면 for문 내에서 잘라도 되지 않았나 하는 생각도 든다 ..) 하지만 2차원 형태가 아닌 3차원 형태로 재배열했기에 이 또한 좋아보인다.

  2. keras.utils.np_utils.to_categorical() official docs multiclass-classification에서 어떤 label인지 표시하는 것처럼 나옴

>>> np_utils.to_categorical([0,1])
array([[1., 0.],
       [0., 1.]], dtype=float32)

소결론

이러한 형태로 전처리한 데이터를 One-Hot Encode Data라고 한다. 그런데 위에서 데이터를 준비할 때 모든 곡들을 한 번에 이어붙였는데 괜찮나? 아니면 글의 마지막에 있는 예시곡을 들었을 때 곡의 분위기가 한 곡에서도 여러번 바뀌었던 것으로 보아 이것의 영향이 있었던 것 같다.

dev-onejun commented 1 year ago

Building a model

model = Sequential()

# Input Layer
model.add(LSTM(
    256,
    input_shape=(network_input.shape[1], network_input.shape[2]), # 몇개의 행렬로 나누었는지 불러옴. (sequence_length, 1)
    return_sequences=True
))
model.add(Dropout(0.3))

# Hidden Layers
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.3))

model.add(LSTM(256))
model.add(Dense(256))
model.add(Dropout(0.3))

# Output Layer
model.add(Dense(n_vocab))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
dev-onejun commented 1 year ago

Fitting the model

filepath = "weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"

# 1.
checkpoint = ModelCheckpoint(
    filepath, monitor='loss', 
    verbose=0,        
    save_best_only=True,        
    mode='min'
)
callbacks_list = [checkpoint]

model.fit(network_input, network_output, epochs=200, batch_size=64, callbacks=callbacks_list)
  1. ModelCheckpoint를 위와 같이 사용해서 200번의 propagation 중 가장 마음에 드는(loss가 작은) weight의 모델을 저장할 수 있다.
dev-onejun commented 1 year ago

Generating music with the model

# make random seed
start = numpy.random.randint(0, len(network_input)-1)
pattern = network_input[start] # same as the variable sequence_length

int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

# generate 500 notes
prediction_output = []
for note_index in range(500):
    # 0.
    prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
    prediction_input = prediction_input / float(n_vocab)
    prediction = model.predict(prediction_input, verbose=0)

    # 1.
    index = numpy.argmax(prediction)
    result = int_to_note[index]
    prediction_output.append(result)

    # 2.
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
  1. LSTM model requires the sequential data to predict

  2. to determine the highest possibility pitch, numpy.argmax() is used

  3. discard the first note and append the predicted note to preserve the variable pattern having 100 (sequence_length) notes only.

dev-onejun commented 1 year ago

Decode to music

offset = 0
music = []

# create note and chord objects based on the values generated by the model
for pitches in prediction_output:
    # pitches is a chord
    if ('.' in pitches) or pitches.isdigit():
        notes_in_chord = pitches.split('.')

        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)

        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        music.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pitches)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        music.append(new_note)

    offset += 0.5

Convert to midi file

midi_stream = stream.Stream(music)
midi_stream.write('midi', fp='test_output.mid')
dev-onejun commented 1 year ago

Author Suggested

  1. various length of notes (or rests) with more classes and deeper LSTM network.
  2. make distinct the start and end of the music (classify the each musics)
  3. about unknown notes.
  4. adding more instruments.

My thought

  1. what about making the length of sequence as bar in music sheet (not 100).
dev-onejun commented 1 year ago

The generated midi file just plays the same chords continuously. In my opinion, iteration in generate_music() which generates the variable prediction_output should fix somewhere. But, for now, I won't fix it ...

dev-onejun commented 1 year ago
skrinsky commented 6 months ago

did you ever fix the problem with prediction_output? I have not been able to solve this