eXtensibleCatalog / Metadata-Services-Toolkit

Tools for processing and aggregating metadata
Other
6 stars 3 forks source link

Normalization: Handling of MARC records where holdings is in the bib #596

Open patrickzurek opened 8 years ago

patrickzurek commented 8 years ago

JIRA issue created by: rcook Originally opened: 2011-07-25 05:49 PM

Issue body: (nt)

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-07-25 05:50 PM

Comment body:

Jennifer and Dave to meeting to discuss how to handle this.

patrickzurek commented 8 years ago

JIRA Coment by user: dlindahl JIRA Timestamp: 2011-07-31 12:29 AM

Comment body:

  1. MARC Norm: in addition to adding $5 with org code to new 9xx fields, Service will also add another subfield to each 9XX indicating that XC generated the field. This will make sure that there is no way that a field added by XC can be mistaken for some other 9XX field (e.g. 945 from III)

Note: if needed, we could insert a subfield that indicates which MST service added the 9xx field.

  1. MARC Norm: config file will have additional settings:

a. if a 945 field is found in a record, then it represents III Holdings data? (Y/N) b. if an 852 is found, then it represents holdings data (Y/N) c. We can add more options if we encounter other ILS’s with holdings in the bibs

Note: We DO NOT support is setting up one service in which some records have 945 defined to store III holdings data, and other records use 945 for something else. In this case, a user would need to install MARC Norm service twice with two different config files. Same thing with 852

For whichever of these fields is N, the service ignores those fields. If a 945 (III) is yes, Norm service maps each instance of 945 data to a newly defined “XC 9XX Holdings� field, to be defined. This XC 9XX holdings fields will “stage� all of the holdings data so that the Transformation Service can find it all in 1 place. It will contain the following data: a. Org code from the 003 b. All of the holdings data from the 945 (Location code, copy number) c. Call number from wherever in the bib record is relevant [a library may want to be able to set up these mappings and include a hierarchy of fields for this] d. Subfield designating that XC created the field If 852 is Y, the service also creates one “XC 9XX Holdings� field for each 852 field, including the same data as above for III 945 data.

  1. MARC Aggregation Service: When it finds instances of the “XC 9XX Holdings� field, it automatically copies these to the output record.
  2. Transformation 2.0: When it encounters instances of the “XC 9XX Holdings� field, it creates new XC Holdings records based on the information in these fields. 852 ad 945 fields that lack the $5 org code and the XC-added subfields are NOT mapped to XC Schema data.

Note: the following requirements for people who setup an MST service pathway with our services:

1) MARC Aggregation service only works correctly with records from MARC Normalization 2) Transformation 2.x only works correctly records from MARC Normalization and MARC Aggregation

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-08-18 02:03 PM

Comment body:

I will start editing the MARC Aggregation Service Merging doc to include this. However, I have some additional thoughts on Dave's comments.

I have applied for an Org Code for XC, which we should have in a couple of weeks. I recommend that we change the services to use this in the 9XX fields instead of an institution's org code. Then it would always be the same. Dave, I think you mentioned this as a possible way to go. This would mean that Norm and Trans don't need incoming records to have 003 fields, which opens things up for IR+ and other non-ILS MARC records to be processed (that's Fogbugz Issue 790).

Config file settings: This is ok, but I question Dave's assumption that we won't process the same field 2 different ways in the same Norm service. Can't we process 945 fields that have the $5 for XC one way, and 945 fields that lack the $5 a different way? That's the whole purpose for having a $5. Otherwise we would need to change the Norm service to not use 945 at all. But then we don't know what other 9XX field might be used for holdings for some other ILS. By design a library or system can define 9XX to be whatever they want, and the $5 is a way that we can ensure that we're ready for anything. I don't understand why we wouldn't be able to support that. OTOH I agree with you for 852 - those fields should either be processed or ignored for all records processed by the service.

I think I agree with the rest of Dave's comments.

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-08-18 08:34 PM

Comment body:

I've edited the MARC Aggregation Merging document in Docushare (not the wiki) to incorporate needed changes:
http://docushare.lib.rochester.edu/docushare/dsweb/Get/Document-42749 Dave, please have a look. The next step is to document necessary changes for Normalization for the config file. I think the existing transformation service can stay as is since it can map III data directly from a 945 - should work for UNCC for the short term.

patrickzurek commented 8 years ago

JIRA Coment by user: dlindahl JIRA Timestamp: 2011-08-22 06:55 PM

Comment body:

There is a bunch of stuff for me to respond to in here, but I have an initial question for Jennifer - why did you apply for an ORG code for XC? I do not believe that an org code is appropriate for XC, as XC is like a "software vendor" like Exlibris, or Innovative. Org codes are for organizations. If the ORG code is meant to represent software (say like the MST, or a service in the MST), that will not work since there will be many installed instances of the MST, and therefor, we will have records with the same org code coming from different MSTs.

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-08-22 08:24 PM

Comment body:

The definition for the org codes simply says that they are "...short alphabetic codes used to represent names of libraries and other kinds of organizations that need to be identified in the bibliographic environment." The XCO is an organization, so we are eligible to have an org code. I applied for one, and then we can decide if/how we want to use it. (BTW, Hathi Trust, Marcive, Ex Libris, Backstage Library Works, and Sirsi all have their own org codes). I am proposing that we use it instead of individual library org codes in the output of our Normalization Service (you had talked about our coming up with some kind of abbreviation to indicate XC, and this could fulfill that function).

Org codes are NOT system specific. For example, if we were to assign an org code to UR Research MARC records, it would be NRU, just like for Voyager. The code is for UR, not for the system (which is a related problem that we need to discuss). Since that is already the case, we wouldn't be using the code inappropriately to use it in the output of Normalization, although I think now that we should not put it in $5 (which is already defined) but define some other subfield for it.

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-08-24 08:47 PM

Comment body:

A couple of new documents to look at now in addition to the changes to the Aggregation Merging document that show how Normalization will "stage" the holdings data that was embedded in the bib.:
http://docushare.lib.rochester.edu/docushare/dsweb/View/Collection-6820 This folder includes changes to norm service documentation and specs, as well as a spreadsheet that shows mappings from 3 different types of embedded holdings to the new 953 field, and another spreadsheet that lists the new 953 field along with the other local 9XX fields and explains what it is. Dave should review this next.

patrickzurek commented 8 years ago

JIRA Coment by user: dlindahl JIRA Timestamp: 2011-09-13 05:09 PM

Comment body:

Jennifer, I went through the four documents, and I got somewhat confused. I added some comments and ideas. I am sure that there are details that we discussed that are not clicking with me anymore, feel free to remind me. If it is easier, we can meet tomorrow to discuss. I uploaded 2 new document versions to the collection:

http://docushare.lib.rochester.edu/docushare/dsweb/View/Collection-6820

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-09-14 05:03 PM

Comment body:

Added new versions to respond to your comments. I think we're making progress.

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-09-15 08:41 PM

Comment body:

Added another version to address your ideas for handling both versions of the service.

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-09-16 03:23 PM

Comment body:

Embedded holdings staging document now on Google Code - I made a few changes to add additional subfields. Also documented changes for Transformation 2.0

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-10-03 04:27 PM

Comment body:

Moving to release 1.4 queue. Was assigned to Dave, but think can be moved to John. Please correct me if it is not ready to go to John.

Note: this issue is for changes to MARC Normalization, to set the stage for other services. And this will help institutions like UNCC that have holdings in the bib.

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-10-03 04:29 PM

Comment body:

Also, I think the linked documentation Jennifer included also pertains to adding support for 952 for Koha institutions that have holdings embedded in the bibs.

patrickzurek commented 8 years ago

JIRA Comment by user: Chris Delis (cedelis) JIRA Timestamp: 2011-11-30 04:28 PM

Comment body:

Just thought I'd ask: does anyone have any sample records I could use to help work on this case? Any Koha/III specific records? Any others that are pertinent? Otherwise I'll have to create my own probably, which is never as good as a real life example. Thanks.

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-11-30 09:01 PM

Comment body:

Try the Oregon and RIT data in this collection: http://docushare.lib.rochester.edu/docushare/dsweb/View/Collection-5047

I have also sent a note to the Spain folks and will let you know if I hear anything.

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-12-01 06:46 PM

Comment body:

All the Spain repos are private, but there may soon be a public site. Will post when I can.

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-12-01 08:54 PM

Comment body:

Jennifer, can you add some clarity to this issue and the general thread of work relating to the upcoming Norm service changes.

There is a document http://docushare.lib.rochester.edu/docushare/dsweb/Get/Document-48283/Normalization%20Changes%20Needed%20for%20Aggregation%20jb4.docx

And it references Version 1 and Version 2 of Normalization services, but it also references version 2 of the Transformation service and we are not currently planning for that work.

This document also makes reference to 001 and 003 stuff, which I think is related to MST-330 and you have been making changes to that document lately. (side note, it is in the 1.2 milestone, but 1.2 is done, so does it really belong in 1.3 or somewhere else?).

Also the Docushare link to your spreadsheet on Google Docs for 9xx fields for Norm is broken. Is that not needed anymore or was it moved elsewhere?

Bottom line: Can you help pull out the parts of these docs that Chris is supposed to work on NOW vs some future set of work?

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-12-02 07:32 PM

Comment body:

There is a repos in Spain that we can use. I have been given permission by the Ministry to use this site, I think there are 190K records.

http://pre.mpr.bage.es/cgi-bin/koha/oai.pl

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-12-05 03:11 PM

Comment body:

I’ve gone through these, and they are all OK for Chris to work on EXCEPT 766 and 979, which are the embedded holdings issues. These cannot be done until related changes are made to Transformation, which are kind of significant, and need to be timed carefully since the earlier Norm Service won’t work with the new version of Transformation, and vice versa.

I’d suggest having Chris do all but those two. Then, perhaps have someone get comfortable with making some of the easier changes to Transformation (and I’ll go through and label those that could be done without “breaking� Drupal).

Then, once somebody is comfortable with working on both Norm and Trans, make these changes to both services at the same time for embedded holdings, and then release new versions of the services to call them 2.0 that handle these changes.

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2011-12-05 03:14 PM

Comment body:

I also fixed the Docushare link to the Google doc about 9XX fields.

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2011-12-08 05:30 PM

Comment body:

John might like a side project to the large Aggregation Service project. I propose that he and Chris work on the two services need to complete this work (emb holding in bib) and time the release.

Note: This will be of great use to the Spanish team that uses quite a few Koha repos.

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2012-04-06 04:19 PM

Comment body:

Just changed the title to refer to Normalization. I'll do the same for 835 to make that refer to Transformation. The two go together.

patrickzurek commented 8 years ago

JIRA Coment by user: jbowen JIRA Timestamp: 2012-06-01 03:58 PM

Comment body:

I have now posted the spec that describes this work to Google Docs: https://docs.google.com/document/d/1ejN_VoQn1HhdRDb4RBWWKFE-695uR2I6xOSPx8O70lY/edit

The edited Norm config file that goes along with this is also available: Just the pages that have changes (printable, for discussion): https://docs.google.com/document/d/1dQHKq0QqBSZeqjL9W0NutHrdPYnr_Kwz0FLuN5gsuKY/edit The entire config file (same version as above, but the whole thing including the enabled steps) is in Docushare: http://docushare.lib.rochester.edu/docushare/dsweb/View/Collection-6820

patrickzurek commented 8 years ago

JIRA Coment by user: rcook JIRA Timestamp: 2013-01-28 11:02 AM

Comment body:

[~cdelis] is in process of releasing the 1.4.1 suite of MST services. He has commented out the incomplete embedded holdings code (http://code.google.com/p/xcmetadataservicestoolkit/source/detail?r=3486 ). This includes the first release of the MARC Aggregation service.

Jennifer's last post in this issue links to her specs to build the correct functionality for embedded holdings.

[~acarot] [~salva]