MehmedGIT / OtoN_Converter

Converter from basic OCRD process workflow to Nextflow workflow script
Apache License 2.0
4 stars 1 forks source link

`,` is not an allowed character #9

Closed mweidling closed 2 years ago

mweidling commented 2 years ago

For evaluating my OCR results I use the following minimal workflow:

ocrd process \
    "tesserocr-recognize -I OCR-D-IMG -O OCR-D-OCR -P segmentation_level region -P textequiv_level word -P find_tables true -P model Fraktur_GT4HistOCR" \
    "dinglehopper -I OCR-D-GT-SEG-BLOCK,OCR-D-OCR -O OCR-D-EVAL-SEG-BLOCK" \

This yields:

Convert OCR-D workflows to NextFlow …
Converting from: minimal.txt
Converting to: minimal.txt.nf
Syntax error!
Invalid line number: 2!
Info: TOKEN_SYMBOL_ERROR_RULE_03: Invalid token: OCR-D-GT-SEG-BLOCK,OCR-D-OCR.
Hint: Tokens cannot contain character: ,.

Since the syntax of the workflow is correct according to https://ocr-d.de/en/workflows#step-18-ocr-evaluation, , has to be allowed under certain circumstances.

MehmedGIT commented 2 years ago

@mweidling, thanks for reporting that. I will try to push a patch that fixes this issue today. In the minimal workflow example, you provided, the last line must not contain \. This is something that the validator complained about as well. It is the expected behavior.

mweidling commented 2 years ago

In the minimal workflow example, you provided, the last line must not contain . This is something that the validator complained about as well. It is the expected behavior.

My bad, that's a copy & paste error.

MehmedGIT commented 2 years ago

This should be fixed now.

Still worth mentioning that the input/output match validation is not performed now. The match validation for that will be implemented after some refactoring of the code. It's now the user's responsibility to provide the correct -I/-O values in the ocr-d workflow.txt file.