Closed kba closed 8 years ago
Our rules say that that doesn't matter whether there are 1 or many spaces. We should suggest maybe another kind
of comparison in ocropus-errs
which will delete multiple spaces (but let 1 space intact), i.e.
if kind=="nomultiplespace":
return re.sub(ur'\s+','\s',s)
in https://github.com/tmbdev/ocropy/blob/master/ocrolib/common.py#L126
I created an issue in ocropus repo: https://github.com/tmbdev/ocropy/issues/98
E.g. header line in http://digi.bib.uni-mannheim.de/fileadmin/digi/445441798/max/445441798_0016.jpg
vs