RDARegistry / RDA-Vocabularies

http://www.rdaregistry.info
63 stars 16 forks source link

Holdings information in RDA #90

Closed kiegel closed 5 years ago

kiegel commented 8 years ago

RDA guidelines and instructions are designed to support the FRBR user task obtain, which is to acquire or access the resource described (RDA 0.0). The task obtain is related to the FRBR Item class: that is, in the library context an object that a user attempts to obtain is necessarily an Item rather than a Manifestation, an Expression, or a Work.

Current RDA/RDF properties for the Item class fall into two categories: relationships (whether to agents or other WEMI classes) and descriptive elements, which include identification (custodial history, source of acquisition, identifiers, and notes), carrier information, and restrictions on access or use. The great majority of these data elements support the user task identify or select rather than obtain. In fact, there is a serious gap in RDA in support of the task obtain, which may stem from the FRBR model underlying it.

Here is a list of requirements for new Item elements to support catalog and other user displays. Note that this is not an attempt to specify a circulation system, which would need many more data elements. Location. A location needs three levels of hierarchy: holding institution, holding location, and shelving location: for example, City Library, Main Branch, General Stacks. Note that an item may have multiple locations, e.g. some parts of a serial run may be in storage while others are in open stacks. There may also be multiple copies of an item in one location or in different locations, with the same or different holdings statement. Shelf Mark. A shelf mark includes a call number and any associated prefixes and suffixes, including copy numbers. Each physical piece of an item should have a unique shelf mark; shelf marks may vary by more than enumeration and chronology, that is, the base call number may be different as well. Holdings Statement. Holding statements are text strings that give a human-readable summary of holdings information that includes captions, enumeration and chronology. Three types of holdings statements are needed: basic bibliographic unit, supplementary material, and indexes. Access URL. Access URLs are specific to the institution. This is different from the Uniform Resource Locator (RDA 4.6), which is a Manifestation element in RDA/RDF. Status. Users need to know the current status of an item in order to know whether and when they may obtain it. Examples include: on shelf, checked out, missing, withdrawn, at bindery. Status should be available for each physical piece.

There are important architectural questions to resolve when designing elements for holdings information. For example, the current Item class is flat, with no levels of hierarchy. Yet in library practice it is common to use two levels: summary holdings statements by location, and items for physical pieces subordinate to them. Some designers may be tempted to forego summary holdings and construct summaries on the fly from piece-level holdings. For extensive holdings (1,000+), this will fail for performance reasons. Also, in complex situations, for example when there are many missing issues, libraries want to construct holdings statements manually in order to improve comprehension and usefulness. If two levels of hierarchy are used, should it be done uniformly or only when needed? The great majority of titles (single volume monographs) do not require a summary holdings statement, so effort would be wasted in creating them. On the other hand, if practice is not uniform, the logic of processing holdings information becomes more complex. There are also edge cases that add to complexity: for example, a single volume monograph with an accompanying disc, where the disc is boxed and shelved separately; single volume monographs that have been rebound in multiple volumes; volumes of a multipart item that have been rebound together; and bound-with and filmed-with titles in general.

DianeHillmann commented 8 years ago

Joe:

In my MARBI days I spent a lot of time with the MARC Holdings format and ANSI/NISO Z39.71-2006 (Holdings Statements for Bibliographic Items). A lot of the issues you address are ones that those standards cover. I realize all too well that a lot of people don't like MARC Holdings, but I suspect most of that is lack of experience with it--certainly it looks like a bear in its native form. I personally think that it could easily be the basis of of the sort of element set you envision, starting with something like the http://marc21rdf.info that we did a few years ago. For it to be part of the RDA standard, there would need to be a few further steps, but I think a 'proof of concept' might be a good first step.

Make sense?

Diane

kiegel commented 8 years ago

NISO Z39.71 is useful but I don't think it is the place to start.

The first step is to resolve conceptual problems with the FRBR Item class (see more on that below). Different conceptualizations of the class require somewhat different sets of properties to implement them.

Another early step is to get some clarity on use cases. My use case is catalog-like holdings displays, which do not require the depth Z39.71 provides. There may well be use cases that do need this depth: for example, machine processing of ILL requests.

Then we will be in a position to decide whether Z39.71 is needed. There are other contenders, such as Enumeration and Chronology of Periodicals Ontology (http://cklee.github.io/ecpo/ecpo.html). Despite its name, it can be used broadly for serials and multipart monographs.

If Z39.71 is chosen, it still doesn't fully answer my use case. It lacks important data elements, in particular Status. It also doesn't deal with access URLs. The MARC approach to URLs is kind of a mess and not a model we should follow in RDA/RDF.

So to your question of whether a proof-of-concept of Z39.71 is a good first step, I would say no.

FRBR Item class

The original FRBR study, which elaborated the WEMI model, was focused on bibliographic description. It did consider data elements specific to copies of materials in library collections, such as accession numbers and call numbers, but holdings information in general was implicitly out of scope.

A FRBR Item is "a single exemplar of a manifestation". The FRBR study notes that the Item entity may consist of one or more than one physical object. It says that by treating Items as an entity we are able to describe characteristics that pertain to circulation. But this is as far as it goes: there is no more detail, no description of data elements, no examples of circulation information, not even a mention of serials. Today, viewed through the lens of linked data and requirements for holdings information in library and other applications, the conceptual development of the FRBR Item class is inadequate. We need to think about, and perhaps change, the model to support holdings information.

At least three options present themselves.

Single flat item. This is the FRBR model today, where each copy is represented by a single instance of the Item class. This works well for the majority case of single-volume monographs, but even multiple copies in one or more locations is complicated: the Item instance must have multiple call numbers (for each volume), multiple identifiers (barcodes or RFIDs), and multiple statuses (on shelf, checked out, etc.). Things get more complicated when a copy has multiple physical pieces. Some kind of internal structure is needed to keep associated data elements together.

Multiple flat items. The model could be changed to represent each physical piece with an instance of the Item class, all linked to the manifestation they represent. This provides a natural package to keep location, call number, identifiers, and status information together for each piece. The case of a single-volume monograph is a natural extension of more complex holdings situations. A drawback, however, is the difficulty of treating the physical pieces of a copy together as one copy.

Hierarchical items. A hybrid of the first two models has advantages. A top level instance of the Item class represents a copy as a whole, and below it are instances for each physical piece. This allows a copy to be viewed or acted on in different ways depending on the needs of the application. A drawback is that a single-volume monograph is more complex (two items in a hierarchy) than in other models. It should be noted that many ILS's take this hierarchical approach.

To sum up, it is difficult to proceed to the step of creating new holdings properties for Items without a better idea how the Item class itself is best structured or used in support of holdings information.

DianeHillmann commented 8 years ago

Joe:

I apologize--I wasn't very clear about what I was suggesting, so let me start out again.

I wasn't suggesting that we start off with Z39.71, but there is an important relationship between that and MARC Holdings, probably best expressed in the MARC Holdings documentation: "The ANSI and ISO standards give the form for display of holdings statements, while MARC is a carrier for the data elements that are used to construct the display." There's a correlation table for the NISO (and ISO) data elements with MARC Holdings here: https://www.loc.gov/marc/holdings/hdapndxd.html

What I think we shouldn't ignore is the structural and relationship pieces that MARC Holdings worked out, particularly the relationships between caption and pattern, enumeration and chronology. This was all designed to make it feasible to expand and compress statements (which never worked well, sadly), and prediction of next issue, used extensively in the days when such patterns were essential for serials checkin systems. Although the holdings format was designed for a different world, the thinking behind it and the way it worked is still valid (see: https://www.loc.gov/marc/holdings/hd853878.html). It was possible to use any level of detail to express holdings, from the highly encoded to purely textual.

There are some full record examples that show holdings statements and the various levels that can be used to record holdings from simply institutional to very detailed (https://www.loc.gov/marc/holdings/examples.html). I know it's hard to look at and interpret from these examples, but think of it as just another coding language that you haven't seen much, or ever. There's a model here that I think has legs, and the MARC approach to URLs is pretty irrelevant in this case (that's based on the old NISO Z39.2 standard that's the basis for MARC). But it handles the hierarchical model you describe, and in fact the reason many ILSs use it is because of MARC Holdings model.

The problem of dealing with all this in an environment where RDF statements can't actually be linked together is a general problem. If you take a look at the way we handled that for the MARC bibliographic format at http://marc21rdf.info, you'll see that there's more than one way to skin that cat.

I hope this makes more sense?

Diane

GordonDunsire commented 8 years ago

I agree that there are gaps in RDA support for the obtain task, and in the elements for the Item entity.

I'm not sure what Location is and how it is related to Shelf Mark.

I did some work on collection-level description a few years ago, based on Michael Heaney's model available at http://www.ukoln.ac.uk/metadata/rslp/model/

I found this covered two main architectures. The first is the poly-hierarchical arrangement of distinct items within collections within collections. An item can belong to more than one collection, and a whole collection can be an item in another collection. The second is the relationship between Collection, Place, and Agent. This is often blurred in library documentation: the term "library" itself can mean all three (the item is part of the library, the item is located in the library, the item is owned by the library).

In my own experience the three levels of Location/Place are all defined physical spaces ("places"), and constitute the library building, the area within the building that houses the collection containing the item, and the shelving location. The area housing the collection may be a set of disconnected places with a common label. All should be capable of cartographic representation, e.g. map of library shelf collections.

As defined (A shelf mark includes a call number and any associated prefixes and suffixes, including copy numbers. Each physical piece of an item should have a unique shelf mark) Shelf Mark is a sub-type of RDA "identifier for the item".

In practice, this would be intrepeted as what is on the "spine" of the item. As such, it is rarely unique. Most libraries will omit, for example, a prefix for the library and branch, and only include a shelving location and, in the USA, a Cutter number or other suffix to a category (fiction) or subject (non-fiction). This allows stock to be moved around branches without changing the shelf mark.

In theory, the value of Shelf mark would be an aggregate of the data from all three levels in Location plus a call number, etc.

I'm also not sure about Access URLs. RDA "Uniform resource locator" relates to access to a Manifestation. In FRBRoo, online digital resources are expressions that cannot have manifestations, in the sense of a finite publication, and therefore items - until they are downloaded to a Manifestation Carrier.

Loosely based on a principle of generalization and conceptual abstraction in the original FRBR model, I think Manifestation is a better compromise than having both a Manifestation and an Item element for access to an online file.

In what sense are access URLs specific to an institution? Do they locate an authentication service or embed authentication data? Are access URLs specific to single Items?

I agree that users need to know the current status of an item, but not, from an RDA perspective, to support circulation control of physical stock.

In the context of RDA, this is likely to be required for preservation and provenance data in archives, museums, and rare materials collections.

Generally, I'm not sure about elements that are intended to accommodate dynamic values supplied by automated systems, as they are of no utility for RDA data capture and storage techniques. (Fixed values, say from image capture and OCR, are a different matter.) Should RDA supply elements that are only used in application displays? In a linked data application, RDA would expect the application to use a local link from an instance of RDA Item to a dynamic value generated by another application.

All FRBR/RDA classes are "flat" because there is no reason to sub-type them. All have generic whole/part elements: "contained in" and "container of".

The definition of FRBR Item, (a single exemplar of a manifestation), is ambiguous. Each Item is a single thing, but there can be many exemplars, i.e. Items, of a Manifestation. The cardinality is many-to-one-and-only-one from Item to Manifestation, and one-to-many from Manifestation to Item. A Manifestation persists even if all of its Items are destroyed.

For single-volume monographs, each Item is a single physical copy, with one Location, Shelf-mark, etc:

MonographManifestation1 has exemplar MonographItem1 MonographItem1 has identifier "barcode/shelfmark1"

For a multi-volume monograph, RDA/FRBR expects whole/part relationships to be used:

MonographManifestation1 has exemplar MonographItem1 MonographManifestation1 has exemplar MonographItem2 ... MonographItem1 contains VolumeAItem1 MonographItem1 contains VolumeBItem1 ... VolumeAItem1 has identifier "barcode/shelfmark1" ...

For serials:

SerialTitleManifestation1 contains SerialVolumeManifestation1 SerialTitleManifestation1 contains SerialVolumeManifestation2 ... SerialVolumeManifestation1 contains SerialIssueManifestation1 ... SerialIssueManifestation1 has exemplar SerialIssue1Item1 ... SerialIssue1Item1 has identifier "barcode/shelfmark1"

This can be extended for article-level description, to link data for e-serials and printed serials:

SerialIssueManifestation1 contains SerialIssue1Article1 ...

Holdings is something else - I'll comment separately.

kiegel commented 8 years ago

In response to Diane. (Sorry about the bold in my last post; that was not intentional!)

I like Z39.71 and think the theoretical basis of it is strong. It would be a great addition to RDA and would help to support communication among libraries.

In terms of converting it to RDF, let's use the NISO standard as the basis and not the MARC version. It's best not to bring the MARC tagging forward.

kiegel commented 8 years ago

In response to Gordon.

Location. What I have in mind is physical place. As you said, there are three levels. I think this is important to note because Z39.71 has only two levels.

Shelf Mark/Call Number. Terminology is not standardized so it is difficult to talk about this. I have in mind the spine marking, but you make a good point that libraries may routinely omit parts of this information in user displays. I do not consider an aggregate of location and call number data to be useful.

Access URLs. While in theory a URL corresponds to a manifestation, in practice things are more complex. Many paid resources do have embedded authentication data, so it seems to me that they are "item" information. They correspond to a "copy" that is available only to certain people. For example, http://washington.eblib.com/patron/FullRecord.aspx?p=4322452, which has IP restriction to the University of Washington. It seems better to control such URLs at the Item level rather than at the Manifestation level.

Status. I am thinking about an API call to a circulation system. The purpose of an RDA element would be to record the URI of the call, so that it is associated with the item and can be found through queries. But it could also be handled outside of RDA linked data too.

FRBR Item class. A flat class can work, but as you note, we need a property to link volumes to the item: MonographItem1 contains VolumeAItem1 MonographItem1 contains VolumeBItem1

That's the piece that seems to be missing.

Shelf mark as identifier. Yes, the shelf mark is an identifier and can go in the item identifier property. However, this will cause practical problems because items have multiple identifiers with different uses. Putting all identifiers in a single property forces users to test string values to try to figure the kind of identifier. We don't want to display barcode numbers in the catalog, we want call numbers because the collection is arranged by call number, not barcode. An analogy is titles: it would be theoretically possible to put all title information in a single property, but practice has shown that it is much more useful to have separate properties. I would argue that some item identifiers need separate properties for the same reason.

kcoyle commented 8 years ago

Joe, isn't part of the problem also in FRBR's manifestation entity? Any description of the library's item holdings should be linked to the physical description of the manifestation. That the manifestation is described as a single volume or multiple volumes should be the starting point for the library's item-level information. The problem that I see is that there is no real connection between the physical description in manifestation and the item information (other than saying that this is an item of that manifestation). Item information should probably be linked directly to certain information in the manifestation. So while the description may be "2 v." how does that link to item information like:

v.1 [barcode] v.2 [barcode]

It doesn't, because there are no physical volumes coded in the manifestation, only a description of the extent, and individual elements in the manifestation (or any other entity) do not have relationships with elements in other entities.

It would be logical, in a data processing environment, to know that the publication consists of 2 volumes and that if the library has v.1 only, that volume 2 is missing. It would also make sense that the description of the extent link directly to the library's data about the physical volumes, like barcodes. Right now, all of this has been shuttled off to the MARC holdings record, repeating the physical description as coded data, again without a direct link to the physical description in MARC bib.

I think that FRBR:manifestation and BF:instance both overly reflect legacy cataloging practices. They also both mix bibliographic description (titles, series, editions) and physical description (size, number, physical type). I don't think we'll get far in terms of describing holdings unless we fix this fundamental problem.

kc

On 2/10/16 9:20 AM, Joe Kiegel wrote:

NISO Z39.71 is useful but I don't think it is the place to start.

The first step is to resolve conceptual problems with the FRBR Item class (see more on that below). Different conceptualizations of the class require somewhat different sets of properties to implement them.

Another early step is to get some clarity on use cases. My use case is catalog-like holdings displays, which do not require the depth Z39.71 provides. There may well be use cases that do need this depth: for example, machine processing of ILL requests.

Then we will be in a position to decide whether Z39.71 is needed. There are other contenders, such as Enumeration and Chronology of Periodicals Ontology (http://cklee.github.io/ecpo/ecpo.html). Despite its name, it can be used broadly for serials and multipart monographs.

If Z39.71 is chosen, it still doesn't fully answer my use case. It lacks important data elements, in particular Status. It also doesn't deal with access URLs. The MARC approach to URLs is kind of a mess and not a model we should follow in RDA/RDF.

So to your question of whether a proof-of-concept of Z39.71 is a
good first step, I would say no.

FRBR Item class

The original FRBR study, which elaborated the WEMI model, was focused on bibliographic description. It did consider data elements specific to copies of materials in library collections, such as accession numbers and call numbers, but holdings information in general was implicitly out of scope.

A FRBR Item is "a single exemplar of a manifestation". The FRBR study notes that the Item entity may consist of one or more than one physical object. It says that by treating Items as an entity we are able to describe characteristics that pertain to circulation. But this is as far as it goes: there is no more detail, no description of data elements, no examples of circulation information, not even a mention of serials. Today, viewed through the lens of linked data and requirements for holdings information in library and other applications, the conceptual development of the FRBR Item class is inadequate. We need to think about, and perhaps change, the model to support holdings information.

At least three options present themselves.

Single flat item. This is the FRBR model today, where each copy is represented by a single instance of the Item class. This works well for the majority case of single-volume monographs, but even multiple copies in one or more locations is complicated: the Item instance must have multiple call numbers (for each volume), multiple identifiers (barcodes or RFIDs), and multiple statuses (on shelf, checked out, etc.). Things get more complicated when a copy has multiple physical pieces. Some kind of internal structure is needed to keep associated data elements together.

Multiple flat items. The model could be changed to represent each physical piece with an instance of the Item class, all linked to the manifestation they represent. This provides a natural package to keep location, call number, identifiers, and status information together for each piece. The case of a single-volume monograph is a natural extension of more complex holdings situations. A drawback, however, is the difficulty of treating the physical pieces of a copy together as one copy.

Hierarchical items. A hybrid of the first two models has advantages. A top level instance of the Item class represents a copy as a whole, and below it are instances for each physical piece. This allows a copy to be viewed or acted on in different ways depending on the needs of the application. A drawback is that a single-volume monograph is more complex (two items in a hierarchy) than in other models. It should be noted that many ILS's take this hierarchical approach.

To sum up, it is difficult to proceed to the step of creating new holdings properties for Items without a better idea how the Item class itself is best structured or used in support of holdings information.

— Reply to this email directly or view it on GitHub https://github.com/RDARegistry/RDA-Vocabularies/issues/90#issuecomment-182489669.

Karen Coyle kcoyle@kcoyle.net http://kcoyle.net m: 1-510-435-8234 skype: kcoylenet/+1-510-984-3600