OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
119 stars 31 forks source link

allow passing param json via stdin in CLI #239

Closed bertsky closed 5 years ago

bertsky commented 5 years ago

Sometimes it is preferable to specify parameters for processing on the fly. But the CLI module expects a filesystem path. It could be easily extended to allow - for stdin as well, by adding allow_dash=True to ocrd_tool_tool_parse_params, which is available since Click 6.0. This would enable on-the-fly specifications using here strings, e.g.:

ocrd-tesserocr-recognize -m mets.xml -p - <<<'{ "textequiv_level": "glyph", "model": "Fraktur" }'
kba commented 5 years ago

As a walkaround in bash/zsh you can also do:

ocrd-tesserocr-recognize -m mets.xml -p <(echo '{ "textequiv_level": "glyph", "model": "Fraktur" }')

which creates an anonymous filehandle or a tempfile on-the-fly.

kba commented 5 years ago

Quick ideas on this, since @mikegerber also ran into this:

1) Allow passing STDIN to -p as @bertsky suggested 2) Check the string passed to -p. If it begins with {, treat it as a JSON string, parse it. Otherwise, treat it as a filename, read it and parse it. 3) Additional parameter -P which accepts a JSON string and parses it. -P string merged on top of -p merged on top of default params.

I would prefer not to change anything but if we do, I'd favor option 2 since it's the simplest to implement both by us and workflow composers.

bertsky commented 5 years ago

1 does not work in the workflow cli syntax, and 2 could introduce surprise crashes with strange filenames, so I opt for 3, which looks very tidy. But I would be happy with 2 as well.

mikegerber commented 5 years ago

I'd prefer 3, too, as it seems to be the most comfortable and least hacky option.

kba commented 5 years ago

@wrznr suggested 2) with inverted logic: If the string is a valid filename (file is readable), use it as such. Otherwise try to parse as JSON and throw an exception if it does not.

Would that be acceptable?

bertsky commented 5 years ago

Fine with me.

mikegerber commented 5 years ago

I don't like the "is a readable file" idea that much because it is absolutely possible that a user specifies a wrong filename, but instead of getting some "file not found error", she gets a "JSON parse error". But that's just my opinion, please do as you like :)

kba commented 5 years ago

How about:

mikegerber commented 5 years ago

Looks good @kba :)