UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
https://digi.bib.uni-mannheim.de/ocr-fileformat/
MIT License
176 stars 23 forks source link

multi-choice of files in the web interface #94

Open yanirmr opened 5 years ago

yanirmr commented 5 years ago

currently, the GUI allows choosing only one file every time. If you can pick a large set of files at once and then convert them and download them together - that would be great.

thank a lot.

stweil commented 4 years ago

Especially for many files drag and drop support would also be helpful.

Conversion of many files would still only return a single download, so for example a zip file with the converted files could be returned.

But maybe a Web API would even be better for handling lots of files.

zuphilip commented 4 years ago

For batch transformations I would suggest to use the command line tools, e.g. something like

for filename in *.alto; do
   docker run --rm -it -v "$PWD":/data ubma/ocr-fileformat ocr-transform alto2.0 hocr "$filename"
done

Maybe we could allow multiple files directly as an argument for the CLI scripts.

However, I am skeptical that for larger transformation tasks the Web GUI is suitable or that a Web API is the direction we should go to.

zuphilip commented 4 years ago

BTW there is already some kind of Web API (but not documented), e.g. https://digi.bib.uni-mannheim.de/ocr-fileformat/ocr-fileformat.php?do=transform&from=alto2.0&to=hocr&url=https://rawgit.com/kba/ocr-fileformat-samples/master/samples/alto/2.0/wetzel_reisebegleiter_1901_0021.alto .