OCR-D / zenhub

Repo for developing zenhub integration
Apache License 2.0
0 stars 0 forks source link

Improved versatility for bulk-add #24

Closed kba closed 2 years ago

kba commented 2 years ago

Current situation

The ocrd workspace bulk-add command allows adding many files to a workspace/METS in one go, which is significantly more efficient than doing an external loop e.g. in Bash and adding files individually with ocrd workspace add.

The bulk-add command is based on regular expressions which are applied to the list of files to be added. By applying these patterns on the filenames, the values for fileGrp, ID, pageID etc. are derived.

There are two major drawbacks to this approach:

How it should be

Instead of just filenames, allow users to prepare a whitespace-delimited list of fields to feed into bulk-add, either via command line arguments or by reading from STDIN to allow users to pipe a CSV, possibly created with a spreadsheet tool, into the CLI.

With this approach, users still need to define a regular expression, but it is much simpler, essentially the header line of a spreadsheet, defining the fields.

Steps