gbhl / bhl-europe

Biodiversity Heritage Library Europe
http://www.bhl-europe.eu/
15 stars 2 forks source link

Ingest Content for Content Provider - UBER #333

Open lobajuluwa opened 12 years ago

lobajuluwa commented 12 years ago

Task description: Align (UBER) upload data/structure with ingest tool needs

subtask: Ingest (UBER) data

/mnt/nfs-demeter/upload/providers/de-uber

59 folders with historic expedition reports (journals?) in

/mnt/nfs-demeter/upload/providers/de-uber

Actions to take:

  1. fetch metadata using the fetch-metadata.sh
  2. the upload folders should be placed into a sub folder of /mnt/nfs-demeter/upload/providers/de-uber
  3. once tiffs are available organize the tiffs by chapter and put additional information on pagetype etc. into the image file names.

    Summary:

    • folder names - some id (ISSN?)
    • folder structure - only one level
    • file names:
    • InternalIdentifier - NA
    • FileSequenceNumber - NA
    • PageType - NA
    • PrintedPageNumber - NA
    • medatada available - MISSING
    • metadata in accepted format ?
    • Bibliographic level -
akohlbecker commented 12 years ago

@chris-sleep i am starting with this upload folder

akohlbecker commented 12 years ago

@chris-sleep @melitabirthaelmer we need some metadata otherwise we can't make something from these uploads.

melitabirthaelmer commented 12 years ago

Metadata needs to be harvested over OAI-PMH

akohlbecker commented 12 years ago

the folder names are numbers like 27576 these numbers are the only id like information we currently have. in order to harvest metadata over OAI-PMH we only have these numbers. Can you provide me with a link to the UBER OAI-PMH service? So i can check if these umbers are meaningful.

akohlbecker commented 12 years ago

Thank you for you email, the folder numbers ( = ${FOLDER_NAME} )are in deed matching the catalog id which is also used as part of the oai-pmh item identifieer; so we can get the metadata records from (real uri obfuscated by xxx):

https://edoc.xxxxx.de/OAI-2.0?verb=GetRecord&identifier=oai:xxxx.de:${FOLDER_NAME}&metadataPrefix=oai_dc
akohlbecker commented 12 years ago

i created a script which fetches metadata from the OAI-PMH service.:

/mnt/nfs-demeter/upload/providers/de-uber/fetch-metadata.sh

running this script will download for each of the folders an oai_dc (DublinCore) file and a txt file which contains additinal information like structural information, page type, chapters, pagenumbers, etc.

akohlbecker commented 12 years ago

we seem to have sufficient metadata for 57 monographs of 59 to create a proper folder structure. A script to create the structure is now available. The medatata files are in the work sub folder and ave the fileextension *.txt

akohlbecker commented 12 years ago

@chris-sleep all folders are harmonized, except one folder which is for some reason not properly processed by the create-structure.sh script, here is the according snippet from the logfile wich shows that the create-structure.awk program is not printing any output just "DONE":

processing 27594 ...
  processing as Monograph
  using structure from txt metadata file ./work/christ-farbtafeln-Mn016C962.txt
  running create-structure.awk  ...
DONE

Despite of this one problem we should start the test ingest on uber.

chris-sleep commented 12 years ago

@akohlbecker thanks - I'll set up for a test ingest shortly