ocr of real (old) printing, but dirty

I work with tesseract 3.02.02 on SUSE Linux 13.2

the text to be ocr'd is real printed text of about 1930.
the printing is a little dirty i.e. there are little points and strokes between 
the letters.
though these are far smaller than the other letters, they are interpreted as 
normal letters.

The normal letters are recognized fairly good

as an example:
the picture appended is translated to the text
  15 Ellser Exdmsund Mögsgzerg

Is there a possibility to give parameters to tesseract that it 
. either should neglect letters which do not fit the majority of the other 
  letters, 
. or it should only use letters in a given range of size
. or to firstly make the boxes, 
  then correct the boxes, by hand or program,
  finally translate using the corrected boxes

I have already tried with a config-file containing
  textord_min_xheight 26
  textord_xheight_mode_fraction 0.9
  textord_xheight_error_margin 0.1
  textord_descx_ratio_min 0.3
  textord_descx_ratio_max 0.6
  textord_ascx_ratio_min 1.3
  textord_ascx_ratio_max 1.7
  load_system_dawg F
  load_freq_dawg F
it changes some things but nothing to neglect the points and strokes

I also tried to make the boxes, correct them by erasing the false letters
and then translate with these boxes by using a config file containing:
  tessedit_make_boxes_from_boxes T
but this doesnt what i want.
Is there a poosibility to accomplish this?

a solution with a dictionary is not possible, because the text consists of only 
names of persons and locations.

Another thing i wonder is:
when i ocr an image from .tiff to .txt
and makebox of the same image
some (few) letters are different recognized!

thanks for help in advance

Original issue reported on code.google.com by pj...@aon.at on 19 Apr 2015 at 12:54

Attachments:

example.tiff

Nrp8247 / tesseract-ocr

ocr of real (old) printing, but dirty #1455