Center-for-Research-Libraries / crl-serials-validator

Validate bibliographic and holdings data for shared print.
GNU General Public License v3.0
0 stars 1 forks source link

Create LHR sheet from spreadsheets (and MARC without 583s) when the PAPR flag is used #41

Open nflorin opened 2 years ago

nflorin commented 2 years ago

The LHR output sheet is only printed when the -p flag is used if the original input is MARC with 583 lines present. We need it to be printed with spreadsheets and plain MARC as well.

nflorin commented 2 years ago

This is going to be tricky to work out, for a couple of reasons:

  1. We don't currently pull out a lot of the relevant data from inputs without 583 lines, especially inputs that come in some sort of MARC format.
  2. The current process pulls all holdings of a title together into one string, while making LHRs requires that we keep holdings separated by holding library (or whatever other location code will go into the 852 $b).
  3. Separating holdings by holding location can be difficult, because holding location can be expressed in a lot of different places. Sometimes it is in an 852, sometimes it is a part of a call number, sometimes it is an arbitrary subfield in a 9xx line. In some of these instances the location applies to all holdings in a record, in others it only applies to the holdings on a specific line, or only in the holdings lines that come after the location data.

The second and third ones are the bigger issues, though the third is just a matter of implementing a lot of complex logic.

nflorin commented 2 years ago

It seems to me there are three ways around the second problem:

  1. Maintain two streams of holdings through the process, the one as it currently is and a second one that will only be used when we print out the LHR sheet.
  2. Convert the current holdings method to one where holdings are separated by the future 852$b code, and silently concatenate them in the main Validator process.
  3. Leave the Validator as is and create some sort of external script to build LHRs.

I'm tending towards the third option. I worry about stuffing too much functionality into the Validator, especially things like this that are going to be very specific to CRL.

nflorin commented 2 years ago

An external script should work like this:

  1. Read the Validator's output spreadsheet to find good records.
  2. Use the Validator's list of inputs for the relevant file to find input data locations.
  3. Run the extraction.
  4. Print the output.

It might be possible to leverage some of the existing Validator functions for the third part. I suspect that the Validator isn't sufficiently modular for that, but it might be, or maybe could be made to be.