FLVC / offline-ingest

A rubydora application to do digitool migrations, and eventually affiliate-submitted ingests, into floridora
1 stars 0 forks source link

package program discards shorter JPG structMap in favor of longer PDF structMap #20

Closed lydiam closed 3 months ago

lydiam commented 9 years ago

I'm not sure that this is worth modifying in code, but I think it's worth noting.

/ssa/d2i/FIU_FEOL_books_dumpB/FI06050102, when test-loaded, gives the following messages:

[lydiam@tlhlxftp01-prd FI06050102]$ package --test --server fiu7prod /ssa/d2i/FIU_FEOL_books_dumpB/FI06050102 Processing 1 package: /ssa/d2i/FIU_FEOL_books_dumpB/FI06050102 Invalid package in /ssa/d2i/FIU_FEOL_books_dumpB/FI06050102. 0.00 sec, 0.00 MB BookPackage::FI06050102 (no pid) => collection: fiu:feol, palmm:feol, "Fire Careers, Adventures For Your Life!" Errors: The Book package FI06050102 is missing the following 1 required file declared in the mets.xml file: Exception TypeError - can't convert nil into String for Book package FI06050102, backtrace follows: /usr/local/islandora/offline-ingest/lib/offin/packages.rb:964:in +' /usr/local/islandora/offline-ingest/lib/offin/packages.rb:964:inreconcile_file_lists' /usr/local/islandora/offline-ingest/lib/offin/packages.rb:964:in map' /usr/local/islandora/offline-ingest/lib/offin/packages.rb:964:inreconcile_file_lists' /usr/local/islandora/offline-ingest/lib/offin/packages.rb:844:in initialize' /usr/local/islandora/offline-ingest/lib/offin/packages.rb:46:innew' /usr/local/islandora/offline-ingest/lib/offin/packages.rb:46:in new_package' /usr/local/bin/package:52 /usr/local/bin/package:48:ineach' /usr/local/bin/package:48

Warnings:
Multiple structMaps found in METS file, discarding the shortest (least number of referenced files).
Note: the table of contents derived from the METS file FI06050102/mets.xml has the following issues:
The METS file does specify an associated image file for page Fire Careers, Adventures For Your Life!.
The METS file does specify an associated image file for page Recto.
The METS file does specify an associated image file for page Verso.
The Book package FI06050102 has the following 5 unexpected files that will not be processed:
 - 1029806_FI06050102_001.jpg
 - 1029807_FI06050102_002.jpg
 - 1029808_file1.pdf
 - 1029809_file2.pdf
 - 1029810_file3.pdf

There are 2 structMaps in the mets, one that references 2 JPGs, and one that references 3 PDFs. I believe that the program is discarding the JPG structMap and then erroring out.

Is the solution to delete the PDF structMap?

I tried that and got the following result:

[lydiam@tlhlxftp01-prd FI06050102]$ package --test --server fiu7prod /ssa/d2i/FIU_FEOL_books_dumpB/FI06050102 Processing 1 package: /ssa/d2i/FIU_FEOL_books_dumpB/FI06050102 0.19 sec, 0.00 MB BookPackage::FI06050102 (no pid) => collection: fiu:feol, palmm:feol, "Fire Careers, Adventures For Your Life!" Warnings: The Book package FI06050102 has the following 4 unexpected files that will not be processed:

So I believe that the answer is "yes" - delete the longer structMap that refers to PDFs.

At this point I don't believe that it's worth modifying the program to take into consideration the file file format referenced in the structMap when evaluating structMaps.

ON HOLD

grf commented 9 years ago

On Wed, Jan 21, 2015 at 10:31 AM, Lydia Motyka notifications@github.com wrote:

So I believe that the answer is "yes" - delete the longer structMap that refers to PDFs.

At this point I don't believe that it's worth modifying the program to take into consideration the file file format referenced in the structMap when evaluating structMaps.

I do tests (looking at 'use=' reference, index, archive) and mimetype (image is more heavily weighted than text) but I first use length - really, number of files referenced.

So it's easy to re-arrange this, but I'd rather you put it on as a github issue, referencing this example package (if you can attach the METS file, that would be ideal).

-Randy

grf commented 9 years ago

On Wed, Jan 21, 2015 at 10:31 AM, Lydia Motyka notifications@github.com wrote:

So I believe that the answer is "yes" - delete the longer structMap that refers to PDFs.

At this point I don't believe that it's worth modifying the program to take into consideration the file file format referenced in the structMap when evaluating structMaps.

I do tests (looking at 'use=' reference, index, archive) and mimetype (image is more heavily weighted than text) but I first use length - really, number of files referenced.

So it's easy to re-arrange this, but I'd rather you put it on as a github issue, referencing this example package (if you can attach the METS file, that would be ideal).

-Randy

lydiam commented 9 years ago

I can’t figure out a way to attach files to GitHub. Any suggestions would be appreciated.

From: Randy Fischer [mailto:notifications@github.com] Sent: Wednesday, January 21, 2015 12:25 PM To: FLVC/offline-ingest Cc: Lydia Motyka Subject: Re: [offline-ingest] package program discards shorter JPG structMap in favor of longer PDF structMap (#20)

On Wed, Jan 21, 2015 at 10:31 AM, Lydia Motyka notifications@github.com<mailto:notifications@github.com> wrote:

So I believe that the answer is "yes" - delete the longer structMap that refers to PDFs.

At this point I don't believe that it's worth modifying the program to take into consideration the file file format referenced in the structMap when evaluating structMaps.

I do tests (looking at 'use=' reference, index, archive) and mimetype (image is more heavily weighted than text) but I first use length - really, number of files referenced.

So it's easy to re-arrange this, but I'd rather you put it on as a github issue, referencing this example package (if you can attach the METS file, that would be ideal).

-Randy

— Reply to this email directly or view it on GitHubhttps://github.com/FLVC/offline-ingest/issues/20#issuecomment-70880971.

lydiam commented 9 years ago

This is a GitHub issue – do you want another one?

From: Randy Fischer [mailto:notifications@github.com] Sent: Wednesday, January 21, 2015 12:24 PM To: FLVC/offline-ingest Cc: Lydia Motyka Subject: Re: [offline-ingest] package program discards shorter JPG structMap in favor of longer PDF structMap (#20)

On Wed, Jan 21, 2015 at 10:31 AM, Lydia Motyka notifications@github.com<mailto:notifications@github.com> wrote:

So I believe that the answer is "yes" - delete the longer structMap that refers to PDFs.

At this point I don't believe that it's worth modifying the program to take into consideration the file file format referenced in the structMap when evaluating structMaps.

I do tests (looking at 'use=' reference, index, archive) and mimetype (image is more heavily weighted than text) but I first use length - really, number of files referenced.

So it's easy to re-arrange this, but I'd rather you put it on as a github issue, referencing this example package (if you can attach the METS file, that would be ideal).

-Randy

— Reply to this email directly or view it on GitHubhttps://github.com/FLVC/offline-ingest/issues/20#issuecomment-70880827.

grf commented 9 years ago

See the mets.xml file from FIU, /ssa/d2i/FIU_FEOL_books_dumpB/FI06050102/

https://gist.github.com/grf/9f2a0abbb69320a205cc