OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
119 stars 31 forks source link

Feature Request: Load parameters from ENV #557

Open JensHeinrich opened 4 years ago

JensHeinrich commented 4 years ago

Load the parameters from the ENV, if they are defined to provide an easier handling.

kba commented 4 years ago

IIUC you mean parameters in the sense of command line flags, options and arguments, right? E.g. that

OCRD_INPUT_FILE_GRP=MAX \
OCRD_OUTPUT_FILE_GRP=BIN \
OCRD_OVERWRITE=true \
ocrd-olena-binarize

would be equivalent to

ocrd-olena-binarize -I MAX -O OUT --overwrite

Correct?

mikegerber commented 4 years ago

Not sure what @JensHeinrich meant but I found myself often wanting to configure options for the processors, e.g. reduce this:

ocrd workspace validate --skip dimension --skip pixel_density --page-strictness lax --page-coordinate-consistency off
ocrd-calamari-recognize --overwrite -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI -P checkpoint "/var/lib/calamari-models/GT4HistOCR/2019-07-22T15_49+0200/*.ckpt.json" -P textequiv_level "$TEXTEQUIV_LEVEL"

to

ocrd workspace validate
ocrd-calamari-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-CALAMARI

(Side note: validate does not seem to conform to the JSON/-P parameter convention. Is it intentional?)

I would certainly welcome

  1. another way to straightforwardly specify -p/-P parameters by way of environment variables e.g. OCRD_CALAMARI_PARAMETERS="-P textequiv_level glyph -P foo bar"
  2. Allowing stuff like --overwrite or --skip pixel_density to be defined by -P-parameters
  3. Make globally available parameters like --overwrite configurable globally, e.g. OCRD_GLOBAL_PARAMETERS="-P overwrite true"

All proposed solutions and syntaxes here are just what I came up with in 5 minutes and should be discussed more. (Problems I see: How do I remove a env-specified parameter? Overriding seems easily done though, CLI parameter beats env parameter.)

JensHeinrich commented 4 years ago

Yep, would also go for

explicit (command line flag) > implicit (env) > standard (config file) (should a static config file be added anywhere in the future)

(This is also how for example the ansible project treats this)

@mikegerber unset PARAM should do the job for deleting

Tbh I am mostly just doing the in-house support for our pilot, but having to specify the same options over and over again makes me want to help people

Probably namespacing them like @mikegerber specified in the 1. solution is a good idea

And a global namespace like in 3. Too

bertsky commented 4 years ago

https://github.com/OCR-D/core/issues/376#issuecomment-562349028