BL-Labs / imagedirectory

Manifests of the public domain images uploaded to Flickr Commons, with descriptive information about the books they were taken from.
The Unlicense
73 stars 19 forks source link

Rules for decoding author & title fields? #4

Open tfmorris opened 10 years ago

tfmorris commented 10 years ago

Apologies for questions in the form of an issue, but I'm not sure where else to ask them. The title and author fields appear to have add special markup and/or transformations applied to them. What are the rules to recover the authors' names and the titles of the works?

Authors

Titles

Presumably there are rules/code for applying the transformations in the first place and hopefully they're reversible so the actual information can be extracted again.

Orangeaurochs commented 10 years ago

The rules for this are AACR2 (Anglo-American Cataloguing Rules 2nd ed.) now in the process of being superseded by RDA (Resource Description and Access), both of which are sadly behind paywalls. AACR2 bases its punctuation closely on something called ISBD (International Standard Bibliographic Description), a version of which is available here: http://www.ifla.org/files/assets/cataloguing/isbd/isbd-cons_20110321.pdf See p. 68-69 for some punctuation patterns for titles.

AARC2 and RDA records are encoded in MARC21 (MAchine Readable Cataloging 21), so the publicly available MARC21 manual is a very good source of examples if not a rigorous statement of principles, although it does address many of them: http://www.loc.gov/marc/bibliographic/. In particular, see the 245 field (title): http://www.loc.gov/marc/bibliographic/bd245.html and the x00 (various personal author fields): http://www.loc.gov/marc/bibliographic/bdx00.html

With your examples above, there are a number of complications:

There will be many more such complications I expect, especially if you're dealing with non-book or rare book things. It's a minefield! 8-)