Closed bsagot closed 6 years ago
In the character stream, there is no boundary between the headnote and the body, no whitespace, nothing. Given what I see in the feature file (i.e. the fact that features are associated with the beginning of each "PDF-line"), this might be related to what is causing the issue.
corpus.zip puhvel-h-1-3.pdf
As discussed with Mohamed a few seconds ago, when using the attached training data for the first model (corpus.zip), I get a 100% f-measure when evaluating on the training data, but then when I throw the attached PDF (i.e., the first 3 pages of my PDF, which are included in the training data), the first line of the body-part of each page simply disappears.