Sometimes, the model outputs sequences such as OUT,IN,IN. Logically, there should always be a BEGIN to mark the transition between OUT and IN. To fix this, add a heuristic to convert the first IN into BEGIN: OUT,BEGIN,IN.
Same with IN,IN,OUT to IN,END,OUT.
Proposed solution
implement a function that takes the model output scores for a page sequence, and generates to labels for that sequence.
basic case: use the argmax label
for sequence OUT, X, IN, convert X to BEGIN
for sequence IN, X, OUT, convert X to END
for sequence OUT, X is BEGIN or END, OUT: convert X to BEGIN_END
...
Sometimes, the model outputs sequences such as
OUT
,IN
,IN
. Logically, there should always be aBEGIN
to mark the transition betweenOUT
andIN
. To fix this, add a heuristic to convert the firstIN
intoBEGIN
:OUT
,BEGIN
,IN
. Same withIN
,IN
,OUT
toIN
,END
,OUT
.Proposed solution
implement a function that takes the model output scores for a page sequence, and generates to labels for that sequence.
OUT
, X,IN
, convert X toBEGIN
IN
, X,OUT
, convert X toEND
OUT
, X isBEGIN
orEND
,OUT
: convert X toBEGIN_END
...