Tangugo / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Danish fraktur support in 3.0 #300

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Tesseract currently does not handle old Danish texts written in fraktur 
script very well, so I've trained tesseract r319 to support Danish texts 
written in fraktur. It is not perfect but good enough that I hope it may 
be useful to others. 

The dictionary is based on a mid-19th century Danish - prior to the 1870s 
spelling reform - as fraktur went out of style in Denmark around 1860.
Probably needless to say for those who need it, but it will most likely 
also produce tolerable results for Norwegian texts of the same period. 

It is based entirely on public domain sources and I therefore consider the 
training data to be in the public domain as well. 

Best regards, 
Peter Alberti 

Original issue reported on code.google.com by dsl602...@vip.cybercity.dk on 20 May 2010 at 3:18

Attachments:

GoogleCodeExporter commented 9 years ago
Are you OK with releasing it under the Apache 2.0 license? If so, I will add it 
to
the 3.00 release.

Original comment by theraysm...@gmail.com on 20 May 2010 at 6:37

GoogleCodeExporter commented 9 years ago
I certainly am.

Thanks,
Peter

Original comment by dsl602...@vip.cybercity.dk on 20 May 2010 at 6:56

GoogleCodeExporter commented 9 years ago
Committed, revision 357. Thanks!

Original comment by joregan on 26 May 2010 at 8:08