There are three files {0006,0007,0008}.xml that all belong to the same filegroup gt. If I run ocrd-tesserocr-recognize on the filegroup gt, with output filegroup tess recognize searches for the files of the filegroup in the mets.xml file. If for some reason (files where not added to the workspace in nummerical order?) the files are not returned in numerical order - for example 0007, 0008, 0006 - recognize generates the files tess-0001.xml (0007.xml), tess-0002.xml (0008.xml) and tess-0003.xml (0006.xml).
This destroys the mapping between gt and ocr pages.
There are three files
{0006,0007,0008}.xml
that all belong to the same filegroupgt
. If I runocrd-tesserocr-recognize
on the filegroupgt
, with output filegrouptess
recognize searches for the files of the filegroup in themets.xml
file. If for some reason (files where not added to the workspace in nummerical order?) the files are not returned in numerical order - for example 0007, 0008, 0006 - recognize generates the filestess-0001.xml
(0007.xml),tess-0002.xml
(0008.xml) andtess-0003.xml
(0006.xml).This destroys the mapping between gt and ocr pages.
A simple solution would be to use:
to create the new files to the workspace.