스터디:LSTM:11.4:LSTM 계층의 순전파와 역전파 처리

eubinecto commented 4 years ago

기울기 손실 & 폭주 문제의 해결

eubinecto commented 4 years ago

슬라이드

4개의 장치가 공유하는 입력의 크기 설정: https://github.com/eubinecto/k4ji_ai/blob/e94a5d01fb0fb59a82e0a32e016356bbc9a75840/eb/src/chap_11/rnn_lstm_model.py#L17-L19 여기서 recur_size는 recurrent layer에 들어갈 recurrent neuron의 개수. 사용자로 부터 입력받는다.

ex_in_dim은 (아마도 short for extended input dimension) 같은 입력 (h_t-1concatx_t)를 공유하는 4개의 장치로 들어가게 될 입력의 크기를 말하는 것.

LSTM셀의 기본구조와 그것을 행렬로 표현한 수식을 다시 상기해보기:

슬라이드

이때:

때문에:

임을 알 수 있다.

eubinecto commented 4 years ago

슬라이드

기존의 순환벡터(h_t-1 -> h_t, "단기기억")에 상태벡터 (c_t-1 ->c_t, 장기기억)가 추가되었다:

`forward_rnn_layer`	`forward_lstm_layer`
https://github.com/eubinecto/k4ji_ai/blob/e94a5d01fb0fb59a82e0a32e016356bbc9a75840/eb/src/chap_10/rnn_basic_model.py#L42-L43	https://github.com/eubinecto/k4ji_ai/blob/e94a5d01fb0fb59a82e0a32e016356bbc9a75840/eb/src/chap_11/rnn_lstm_model.py#L46-L48
`recurrent` 벡터만 존재 (SOS이기 때문에 0으로 초기화)	`recurrent`벡터와 함께, `state` 벡터도 같이 사용 (둘다 SOS이기 때문에 0으로 초기화)

전체적인 처리과정이 LSTM셀에 맞추어 변경되었다:

`forward_rnn_layer`에서 순전파 처리	`forward_lstm_layer`에서 순전파 처리
https://github.com/eubinecto/k4ji_ai/blob/e94a5d01fb0fb59a82e0a32e016356bbc9a75840/eb/src/chap_10/rnn_basic_model.py#L45-L54	https://github.com/eubinecto/k4ji_ai/blob/e94a5d01fb0fb59a82e0a32e016356bbc9a75840/eb/src/chap_11/rnn_lstm_model.py#L50-L71

나머지 부분은 동일하지만, 각 시간대를 지나가면서 처리를 하는 부분이 다르다.

장기기억 벡터와 단기기억 벡터의 수식을 상기하면서 코드를 보면 이해가 잘된다: 벡터 & 코드

https://github.com/eubinecto/k4ji_ai/blob/e94a5d01fb0fb59a82e0a32e016356bbc9a75840/eb/src/chap_11/rnn_lstm_model.py#L62-L63

https://github.com/eubinecto/k4ji_ai/blob/e94a5d01fb0fb59a82e0a32e016356bbc9a75840/eb/src/chap_11/rnn_lstm_model.py#L65-L66