eubinecto commented 4 years ago

jargon 리스트

[x] transduction model
[x] constituency parsing
[x] sinusoids
[x] Positional Encodings
[x] BLEU
[x] WMT
[x] factorization tricks
[x] desiderata
[x] transformer
[x] Adam Optimizer
[x] Scaled Dot-Product Attention
[x] Multi-Head Attention
[x] Residual Connection
[ ] auto-regressive
[ ] compatibility function
[ ] language modeling
[ ] conditional computation
[ ] Separable convolutions
[ ] Contiguous kernels
[ ] Residual Dropout
[ ] Label Smoothing
[ ] Beam search
[ ] WSJ

Framework

정의, 예시 문장
논문의 내용과의 연결

eubinecto commented 4 years ago

transduction problems / transduction model / transductive model (machine learning)

통계학에서는 다음의 인용구가 유명한가 봄.

"When solving a problem of interest, do not solve a more general problem as an intermediate step. Try to get the answer that you really need but not a more general one." -- (Vladimir Vapnik, 1990)

어떤 문제를 풀기 위해 또 다른 일반적인 문제를 풀려고 하지말고, 궁극적으로 풀어야 하는 문제에 대한 답을 얻기 위해 노력하라는 뜻인데.

이게 머신러닝에서는 무슨 뜻일까? End-to-End learning과 유사한 의미로 받아들일 수 있을 것 같다.

Many machine learning tasks can be expressed as the transformation, or transduction, of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. -- (Graves, 2012)

machine translation은 transduction problem이다.

transductive model:
-> hidden ->
- intermediate step을 건너뛴다는 점에서, "transitive" 한 접근법
- 풀고자하는 문제만을 풀 수 있다.
inductive model:
-> 한국어와 영어는 어순이 반대다, 문장의 구가 의미를 담고 있는 최소단위이다, 등의 rule을 세운다 ->
- 목표는 한국어를 영어로 번역하는 것이지만, 그 과정 속에서 더 일반적인 문제를 풀어야 한다.

transductive learning 의 예시

spelling correction
machine translation
speech recognition
text-to-speech
language modelling

inductive learning의 예시:

phrase-based translation (번역을 하기위해 먼저 규칙을 정의한다. 번역이라는 문제를 풀기 위해 "더 큰" 문제를 풀어야 한다.)

논문의 내용과의 연결

transformer의 이름과 연관지을 수 있다고 생각.

transformer는 transduction 문제를 보다 더 잘 풀기 위해서 고안된 모델인데, transduction 문제가 하나의 입력 시퀀스를 또 다른 형태의 출력 시퀀스로 "transform"하는 문제이므로, 그런 의미에서 "transformer"라고 이름을 지은 것 같다.

reference

https://en.dict.naver.com/#/entry/enko/af4cec7b05844db89ec28e60517d198c
https://www.merriam-webster.com/dictionary/transducing
https://www.quora.com/What-is-transduction-in-Machine-learning
https://en.wikipedia.org/wiki/Transduction_(machine_learning)
https://arxiv.org/pdf/1409.0473.pdf (phrase-based translation)
https://arxiv.org/pdf/1211.3711.pdf (sequence transduction with RNN)

eubinecto commented 4 years ago

Constituency parsing

의미

Two parse trees for an ambiguous sentence. The parse on the left corresponds to the humorous reading in which the elephant is in the pajamas, the parse on the right corresponds to the reading in which Captain Spaulding did the shooting in his pajamas.

온라인 데모

https://corenlp.run

논문의 내용과의 연결

reference

https://web.stanford.edu/~jurafsky/slp3/13.pdf

eubinecto commented 4 years ago

Sinusoids

사인함수와 코사인 함수를 모두 함께 일컫는 말.

두 함수의 개형은 결국 동일하므로, 그런 의미에서 둘을 모두 일컫는 단어가 필요한데, 그럴 때 쓰는 게 sinusoids.

즉, "a sinusoid"라는 말은 "sine 함수 or cosine 함수"라는 뜻이라고 생각하면된다.

형용사로 쓰일 경우에는 sinusoidal이라고 부른다.

논문에서

That is, each dimension of the positional encoding corresponds to a sinusoid. (pg. 6)

We chose the sinusoidal version because it may allow the model to extrapolate to sequence lengths longer than the ones encountered during training. (pg.6)

positional embedding instead of sinusoids. (pg. 9, table 3)

In row (E) we replace our sinusoidal positional encoding with learned positional embeddings [9], and observe nearly identical results to the base model. (pg. 9. table 3)

references

https://en.wikipedia.org/wiki/Sine_wave

eubinecto commented 4 years ago

Positional Encodings

의미

sinusoidal positional encoding?

related: positional embeddings.

논문에서

To this end, we add "positional encodings" to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension dmodel as the embeddings, so that the two can be summed. There are many choices of positional encodings, learned and fixed [9]. (pg. 6 > 3.5 Positional Encoding > 3. Model Architecture)

That is, each dimension of the positional encoding corresponds to a sinusoid. (pg. 6 > 3.5 Positional Encoding > 3. Model Architecture)

We apply dropout [33] to the output of each sub-layer, before it is added to the sub-layer input and normalized. In addition, we apply dropout to the sums of the embeddings and the positional encodings in both the encoder and decoder stacks. For the base model, we use a rate of Pdrop = 0.1. (pg. 8 > Residual Dropout > 5.4 Regulaisation > 5. Training)

In Table 3 rows (B), we observe that reducing the attention key size dk hurts model quality. This suggests that determining compatibility is not easy and that a more sophisticated compatibility function than dot product may be beneficial. We further observe in rows (C) and (D) that, as expected, bigger models are better, and dropout is very helpful in avoiding over-fitting. In row (E) we replace our sinusoidal positional encoding with learned positional embeddings [9], and observe nearly identical results to the base model. (pg.9 > 6.2 Model variations > 6. Results)

references

eubinecto commented 4 years ago

Residual Connection

ResNet 공부할 때 배웠던 것과 연결되는 부분.

eubinecto commented 4 years ago

Compatible functions

실제 함수 대신에 사용할 수 있는 함수. e.g. Taylor series. e.g. a linear regression model

compatible numbers가 어떤 숫자를 일컫는지 이해하면 compatible functions는 바로 무엇을 뜻하는지 이해할 수 있음.

references

http://www.learnalberta.ca/content/memg/division02/compatible%20numbers/index.html

teang1995 commented 4 years ago

BLEU

번역 모델에 주로 사용되는 성능 평가 지표. 잘 정리된 링크가 있어 공유합니다. https://donghwa-kim.github.io/BLEU.html

teang1995 commented 4 years ago

Factorization Trick

https://arxiv.org/pdf/1703.10722.pdf 해당 논문에서 나온 개념인데 좀 더 파악해서 정리하도록 하겠습니다

teang1995 commented 4 years ago

WMT

번역 모델을 위한 데이터셋.

eubinecto commented 4 years ago

Desiderata

어떠한 것을 이루기 위해 충족해야 하는 조건.

synonyms: requirements

논문에서

Why self-attention

references

https://languages.oup.com/google-dictionary-en/

teang1995 commented 4 years ago

Transformer

이 논문에서 제시한 모델의 이름.

teang1995 commented 4 years ago

Adam Optimizer

그 Adam 맞습니다.

teang1995 commented 4 years ago

Scaled Dot-Product Attention, Multi-Head Attention

따로 설명.

eubinecto / k4ji_ai

Jargon / glossary 의미 정리 #20

jargon 리스트

Framework

transduction problems / transduction model / transductive model (machine learning)

논문의 내용과의 연결

reference

Constituency parsing

의미

온라인 데모

논문의 내용과의 연결

reference

Sinusoids

논문에서

references

Positional Encodings

의미

논문에서

references

Residual Connection

Compatible functions

references

BLEU

Factorization Trick

WMT

Desiderata

논문에서

references

Transformer

Adam Optimizer

Scaled Dot-Product Attention, Multi-Head Attention