Center-for-Research-Libraries / crl-serials-validator

Validate bibliographic and holdings data for shared print.
GNU General Public License v3.0
0 stars 1 forks source link

Create dummy local MARC from spreadsheet inputs to allow for unified validation routine. #40

Closed nflorin closed 2 years ago

nflorin commented 2 years ago

The idea here is to silently convert spreadsheet (xlsx, tsv, csv, txt) inputs into basic dummy MARC, then feed the dummy MARC into the local MARC validation routine. The idea is to have a single validation unified validation routine for everything. This should make the whole process more robust, should make the code shorter, and should make it easier to make changes and upgrades.

nflorin commented 2 years ago

A proposed dummy MARC record. Fields with variables in them would only be present when that data is in the spreadsheet. 86x fields would be repeated once for every line of holdings for the title in the spreadsheet.

=LDR  02283cas a2200625   4500
=001  {$holdings_id}
=004  {$bib_id}
=008  220101nuuuuuuuuxx\uu\p\\\\\\\0uuuu0en\r\
=022  \\$a{$issn}
=110  2\$a{$main_entry}
=245  00$a{$title}
=852  \\$a{$institution_symbol}$b{$holding_library}
=866  \\$a{$holdings}
=867  \\$a{$supplement_holdings}
=868  \\$a{$index_holdings}
nflorin commented 2 years ago

One problem with this solution is that the current method (read everything from both MARC and spreadsheets into dicts and then process the dicts) allows us to deal with titles at individual locations that have multiple records. This is rare in MARC files but does happen. So my original working idea (single spreadsheet line -> dummy MARC -> processed title) probably doesn't work, and instead I'd need to convert the entire spreadsheet on the fly and then start with the processing. But I think it's still useful, so as to concentrate the difficult work into one place.

nflorin commented 2 years ago

I played around with some possible solutions and now think this would cause more problems than it would solve. So I'm closing it.