ContentMine / phylotree

A repository for ami-phylotree development
0 stars 0 forks source link

Incorrect line breaks from Tesseract. #48

Open petermr opened 9 years ago

petermr commented 9 years ago

in some cases Tesseract splits lables into two lines. An example is Escherichia coli in /ami-plugin/src/test/resources/org/xmlcml/ami2/phylo/15goodtree/ijs.0.000364-0-004.pbm.png which is split after the s.

This is detected and probably mended by keeping track of unused phrases in labels.