avalonmediasystem / avalon

Avalon Media System – Samvera Application
http://www.avalonmediasystem.org/
Apache License 2.0
93 stars 51 forks source link

Batch Ingest Fails On Valid MARC XML with "Language Not Recognized" Error #2116

Closed joncameron closed 7 years ago

joncameron commented 7 years ago

Description

Batch ingest manifest manifest fails items at the bibimport stage with the message seen in the screen shot attached when a valid MARC value is present. See Julie's original comment below for more information.

Error Email Received

image

Manifest File Triggering the Error

IULMIA_VSS_Migration_Manifest.xlsx

Original Comment from Julie

Checking out what we have happening in the XSL for bib import, we are looking first at the value in position 35-37 of the 008 field for what is returned in the MARCXML. In both of these examples, there are 3 pipes (|||) in that position. After that check in the XSL, we're looking in 041 which doesn't exist in either of these records.

There does appear to be a case to use 3 pipes (|||) for language in the 008 when there is no attempt to code the language. Both the concise information and the full information on the MARC site for 008 specify this for language info in position 35-37 (concise: https://www.loc.gov/marc/bibliographic/concise/bd008a.html and full: https://www.loc.gov/marc/bibliographic/bd008a.html).

So it looks like we need a catch in the XSL to drop language if 3 pipes is the value in 008/35-37 or allow that value in MCO as a "no information provided" scenario (equivalent to '###') so it translates to "there is no language info."

Do we want this check added to the XSL or handled elsewhere?

Related

Original Issue on IU's Production Board: https://bugs.dlib.indiana.edu/browse/VOV-5491 PR: https://github.com/avalonmediasystem/avalon/pull/2117

joncameron commented 7 years ago

MARC XML from an item that triggers this error: `<?xml version="1.0"?>

1.11info:srw/schema/1/marcxml-v1.1xml 01272ngm a2200301Ka 4500 mr|maaadu||r 040608s1969||||xx 111 g ml||| d BR1217 If.. [motion picture] / Memorial Enterprises. [S.l. : s.n.], 1969. 3 film reels (111 min.) : sd., col./b&w ; 16mm. ref. print Michael Medwin (producer); Lindsay Anderson (producer); Lindsay Anderson (director); Miroslav Ondricek (photographer). Malcolm McDowell, David Wood, Richard Warwick, Christine Noonan. Gift to the Lilly Library from the David Bradley film collection. Vinegar syndrome poor; shrinkage .1-.5%; faded. B-ALF: Bradley Estate May 2002 Bradley, David, 1920-1997, former owner. InU Medwin, Michael, 1923- Anderson, Lindsay, 1923-1994 McDowell, Malcolm, 1943- Wood, David. Warwick, Richard, 1945-1997 Noonan, Christine. Memorial Enterprises. 5 B-ALF _ALFBRAD BR1217 Reel 1 of 3 NONCIRC 1 B-ALF _ALFBRAD BR1217 Reel 2 of 3 NONCIRC 1 B-ALF MESSAGE BR1217 Reel 3 of 3 NONCIRC NEVER 1 1`
jlhardes commented 7 years ago

There is a variable created for the language code that replaces | and # with nothing and then that variable is checked to see if it has a value before the language code from the 008 is used in the MODS element. I made a change to that variable previously to try and also prevent 'N/A' from coming through (an old cataloging practice). That change didn't remove that string but was a check that the variable value didn't equal that string when creating the variable. So that turned the variable into a true/false value and didn't work to make the variable an empty string when it was supposed to be empty. I switched out that 'N/A' check to be a replace() function so now that variable is empty if the language code is '|||', '###', or 'N/A'.

cjcolvar commented 7 years ago

This appears to be fixed on spruce since I can import 5799342 which is the record above which was failing before.

jlhardes commented 7 years ago

Just to complete what was actually changed. The replace() function didn't work because the XSL version has to 1.0 and the replace() function is only available in 2.0. So the variable is constructed now to translate | and # to an empty string (as before) and each time the language code is needed there is a check to make sure the variable is not 'N/A' and that it is also not an empty string before it is called up.