NYPL / catalog_of_copyright_entries_project

NYPL Project to transcribe and parse pages from the US Catalog of Copyright Entries
Creative Commons Zero v1.0 Universal
58 stars 13 forks source link

Corporate authors or no Authors #4

Open seanredmond opened 6 years ago

seanredmond commented 6 years ago

screen shot 2018-03-06 at 12 38 15 pm

In the first two entries should "California" be the author? What about the second two.

seanredmond commented 6 years ago

During the call I think we softly decided that California is the author of the first two and also of the third, and that the publisher is the author of the fourth. For the last, if I remember right, that meant adding <authorName> elements with "California" and "Callaghan & co."

Since in the first two entries there is text that needs to be marked up, I think it makes sense to mark up "California" as the author. For the California reporter, though, we'd be inferring that and I don't think we should add inferences to the markup at this point -- better to add nothing. author is already defined as 0-or-more so we could just find entries with no authors at a later point for more processing. Same for Callaghan's Michigan digest.

Is there something to be preserved about the spacing of "C a l i f o r n i a" and "P a c i f i c" in the 3rd entry?

seanredmond commented 6 years ago

Also, is there any reason to distinguish between human authors and state or corporate authors? Would there be any way to do it if we wanted to?

seanredmond commented 6 years ago

In accordance with #9, markup "California" as the author in the first two examples. The last two will have no author element--"California" and "Callaghan" are just part of the title.