Open tfmorris opened 10 years ago
The rules for this are AACR2 (Anglo-American Cataloguing Rules 2nd ed.) now in the process of being superseded by RDA (Resource Description and Access), both of which are sadly behind paywalls. AACR2 bases its punctuation closely on something called ISBD (International Standard Bibliographic Description), a version of which is available here: http://www.ifla.org/files/assets/cataloguing/isbd/isbd-cons_20110321.pdf See p. 68-69 for some punctuation patterns for titles.
AARC2 and RDA records are encoded in MARC21 (MAchine Readable Cataloging 21), so the publicly available MARC21 manual is a very good source of examples if not a rigorous statement of principles, although it does address many of them: http://www.loc.gov/marc/bibliographic/. In particular, see the 245 field (title): http://www.loc.gov/marc/bibliographic/bd245.html and the x00 (various personal author fields): http://www.loc.gov/marc/bibliographic/bdx00.html
With your examples above, there are a number of complications:
There will be many more such complications I expect, especially if you're dealing with non-book or rare book things. It's a minefield! 8-)
Apologies for questions in the form of an issue, but I'm not sure where else to ask them. The title and author fields appear to have add special markup and/or transformations applied to them. What are the rules to recover the authors' names and the titles of the works?
Authors
Titles
Presumably there are rules/code for applying the transformations in the first place and hopefully they're reversible so the actual information can be extracted again.