issues
search
flrngel
/
understanding-ai
personal repository
36
stars
6
forks
source link
What you get is what you see: A visual markup decompiler
#19
Open
flrngel
opened
6 years ago
flrngel
commented
6 years ago
https://arxiv.org/abs/1609.04938
1. Abstract
this model is end-to-end
model uses convolutional network and recurrent network
current models achieve 25% accuracy, but paper model achieves 75% accuracy
2. Introduction
OCR requires joint processing of image and text data
WYGIWYS
is simple extension of the attention-based encoder-decoder model
Paper introduces
IM2LATEX-100k Dataset
3. Problem: image-to-markup generation
author defined the image-to-markup problem as converting a rendered source image t o target presentational markup
4. Model
Convolutional Network
Convolutional network does not uses fully connected layer
this preserve locality of CNN features in order to use visual attention
Row Encoder
Show, Attend and Tell
shows image feature grid can be directly fed into decoder
decoder contains significant relative sequential order information
so using rnn can be help in
left-to-right order can be easily learned by encoder
RNN can utilize the surrounding horizontal context to refine the hidden representation
Decoder
uses attention model (Bahdanau attention)
uses beam search on test time
5. Dataset
Tokenization
character based models were not that good
Optional: Normalization
modified KaTeX due to produce normalized input data
My Notes
each github project has different loss functions
https://arxiv.org/abs/1609.04938
1. Abstract
2. Introduction
3. Problem: image-to-markup generation
4. Model
Convolutional Network
Row Encoder
Decoder
5. Dataset
Tokenization
Optional: Normalization
My Notes