Open lobajuluwa opened 12 years ago
Assess vs FSG: for content in de-dilibri/333569
Metadata: - oai_dc Directory Structure: Monograph structure, ok
Filenaming: files named with [incremental sequence]-[second incremental sequence].tif
first sequence looks to be image sequence, second same+directory no no page numbering or type data present.
02_333572.tif
03_333573.tif
05_333575.tif
06_333576.tif
08_333578.tif
No reference to this CP in content management wiki page:
@wkollernhm is the metadata good to work with (.. mostly which parameters needed for oai_dc SMT?) @melitabirthaelmer as only 2 books uploaded so far, have we potential for filename updates at source?
oai_dc is not good. Pure DC works perfectly fine but not wrapped into an OAI-PMH response.
Please ask CP to upload "pure" DC records!
dilibri website has various metadata formats listed in archive, and page type/number data associated with served content. Have emailed dilibri technical contact to investigate how this can be uploaded to us.
we can have mods or marc xml metadata.
for page level metadata can also have mets with structure blocks holding number and type data;
We need a single format for all levels. Maybe they can provide a sample for the METS data?
If mods or marc-xml for the bibliographic information does not matter - both are fine.
metadata uploaded
mod.xml for each monograph mets.xml for each monograph
testing PI to validate the mods on de-dilibri/333569
@wkollernhm can you take a quick look please? When I invoke the PI, the output reports:
Executing Schema Mapper... /mnt/nfs/dev/jdk1.6.0_24/bin/java -jar /var/www/schema-mapping-tool/cli/dist/smt-cli.jar -m c -cm 6 -if "/mnt/nfs/upload/providers/de-dilibri/333569/333569_mets.xml" -of "/mnt/nfs/upload/providers/de-dilibri/333569/.aip/olef.xml"
Return Code: 1 (Starting conversion of MODS to OLEF...)
however - the xml source should be 333569_mods.xml - is there any way to pass a name pattern to the schema mapping to match a specific .xml source? (using -m c -cm 6 -if
uploaded mods and mets are not valid:
missing <?xml .....> declaration and encoding data
missing namespace prefixes for mods/mets xsi and xlink
tag mismatch (for sample 333569) -
(note that smt will write olef from manually corrected mods.xml ok)
this is likely an artifact of the xml coming from the oai_dc wrapped source; that xml holds the missing declarations in the outer wrapper. 118 xml files so simples option will be to rewrite the first couple of lines by script
@ZhengLIAtos - I've generated a .aip for de-dilibri/333569 for test purposes (handcrafted the mods for the SMT step), can you please run a test ingest on this and verify if the package would be good? @wkollernhm
ingested. (not yet transformed by Access)
@ZhengLIAtos Thanks - can I ask a question in ignorance; what does not yet transformed by Access mean?
(I'm guessing - does this mean that the base document has reached fedora, but the transformation is needed to push it to the portal?)
@ZhengLIAtos I've rerun the ingest test (hopefully olef is now good) on bhl-test for the content de-dilibri//333569-test
this now looks to be in integration fedora as bhle:10706-a0wwpzkp but not yet indexed; can you please take a look and see if all looks good to you? (the ingest.log says status=completed, so is it just a matter of giving it time before checking the portal? fedora status is also marked active but I don't yet see the data in solr)
Task description: Align (DILIBRI) upload data/structure with ingest tool needs
Ingest (DILIBRI) data
Actions to take: