Closed kba closed 5 years ago
Could this lead to problems
Absolutely, yes.
The alternative is to not change the ID at all and accept that it gets slightly long, e.g. OCR-D-OCR-TESS_OCR-D-IMG-BIN-TESS_1234
.
or maybe even better:
ID = concat_padded(self.output_file_grp, os.path.basename(input_file.url)[:-4])
why generate ids if output_file_grp + basename of file without extension is unique?
I am very much in favour of the solution by @finkf, but I would also like to keep the .xml
extension in the old version (because most PAGE viewers rely on it). The patch does not apply anymore, so should I make a new PR?
Closing as this is superseded (and hopefully resolved to satisfaction) by #48.
Generate the output ID and filename from the input file ID reduced to its numbers.
@finkf