Workpackage 07.12.2011 - Githubissues

Pack complete output (Metadata, TIFFs, OCR) including METS into .aip folder (original directory may only contain original data)
[BUG] .aip Directory is not ignored during scanning
If PDFs and images exist, extract text from PDFs and use images for TIFF creation
If only PDFs exist, then both text as well as images are extracted from them
If only images exist, generate TIFFs and use for OCR reading
If images and text exist, generate TIFFs and use text for OCR
Encoded page-numbers in file-names of pictures-files according to the following standard: <ID>_<SEQUENCE>_<TYPE>_<PAGENUMBER1>_<PAGENUMBER2>...

<ID>: an arbitrary string <SEQUENCE>: continuous number <TYPE>: type of page, based on controlled vocabulary (valid values still have to be defined, equals pageType in OLEF) <PAGENUMBERX>: page number(s) which are visible on the image (repeatable)

Configuration an only be edited by the admin user
All content must be derived from original data, no overwriting of the data during runtime by the CP. If there are errors, the original data has to be corrected and be re-uploaded
Use NOID for GUID generating
Do not generate PDFs (no derivatives)
Last step (before packaging): Enter IPR information (as text) including writing to metadata. If entry already exist, display it from the metadata. Metadata location: <bibliographicInformation><accessCondition> (simple text)

gbhl / bhl-europe

Workpackage 07.12.2011 #319