SAA-SDT / EAD3

https://www.loc.gov/ead/index.html
Creative Commons Zero v1.0 Universal
81 stars 25 forks source link

Should <orignination> in <did> have the same content model as the new *Entry elements in <controlAccess>? #126

Closed rockivist closed 11 years ago

rockivist commented 11 years ago

I haven't thought this through entirely, but my suspicion is that they should.
That said, it could prove a migration challenge and I'd prefer to avoid another structured/unstructured fork for <origination>.

Where <origination> contains one corpName, persName, or famName, migration to origination/corpNameEntry etc. wouldn't be difficult. But what would we do with origination/text()? And what if it included more than one name? In the absence of a <genericNameEntry> it's unclear what the migration path would be aside from a choice between an unstructured <origination> and an <originationStructured>, which seems like overkills.

Something to follow up on. Labelling this comments and migration and assigning it to me.

MinnesotaFox commented 11 years ago

I'm with you. No more branches. Only the form in the pattern persNameEntry in origination as we have in contolledAccess. Eliminating multiple methods was a stated goal.

If someone entered multiple names in a single <origination>, they should have to work it out.

rockivist commented 11 years ago

@MinnesotaFox's points are good ones. My current thinking is this: limit <origination> to a choice of <persNameEntry>, <corpNameEntry>, or <famNameEntry>. For the migration, where <origination> contains only text, build a parameter into the migration stylesheet that sets a default NameEntry element into which to move the text. I'd make <persNameEntry> the default, but allow users to chagne it to <corpNameEntry> or <famNameEntry> if they so choose. Where there is more than one persname, corpname, or famname in the <origination>, create multiple <origination> elements. And when there is a mix of those and text, move the text into a separate <part> within the NameEntry.

Also: @MinnesotaFox: when you input angle brackets in comments on GitHub, you need to escape them with a backtick (the thing under the tilda). Use it as if you were putting the element or attribute name in quotes.

rockivist commented 11 years ago

Might need to include <titleEntry> as an option within <origination> as well.

tcatapano commented 11 years ago

I have doubts about this. I think it is too restrictive for the general schema, but could be required in a "strict" profile. The next version could require it, but I think we have to allow for a looser "transitional" general schema for this revision.

Also, I have no problem with creating a <originationStructured> as well as an <origination>, following the very same logic as with the bifurcation of <physDesc>. I cannot see how this would be more disruptive than requiring such a strict model for <origination>. @rockivist: the migration scenario you describe still sounds like a nightmare :-)

MinnesotaFox commented 11 years ago

The concept of what it is too restrictive and what is not is highly subjective of course, and therefore not easily argued one way or the other. What I can point too is the principle of standardization as a way to create greater consistency in markup to improve interchange and simplify adoption and use.

tcatapano commented 11 years ago

I'm not opposed to this feature request. I understand the desire for it and respect the arguments in its favor. But I do have doubts about how it should be implemented and whether it should be in this version.

However, I do believe a proposal about degree of restriction may be discussed. Have we not implicitly done this for many elements? Nobody in this discussion is claiming that it would not be very difficult to implement a migration from an unstructured <origination> to a structured/restrictive <origination>. That is a fairly objective measure.

Also, while we do have a point of emphasis of "Achieving greater conceptual and semantic consistency in the use of EAD" ("Where prudent, the committee will eliminate alternative methods for encoding a given descriptive convention"), we also have one that states we are "Being mindful that a new version will affect current users".

Finally, I do not necessarily concede that it is a principle that standardization improves interchange and simplifies adoption and use, and the opposite may be (and has been) argued, but I'll do it some other time ;-) For the purposes of this discussion, I will concede it, but suggest that a making available a finely granular/structured and coarse/unstructured element pair is still providing standardization and not providing two ways to do the same thing, but two ways to do two different things. Moreover, offering fine/structured and coarse/unstructured elements is a practice employed by other widely adopted schema: TEI for example does so, as does the NLM/NCBI DTD, now the Journal Archiving Tag Suite (JATS), which is the DTD used in PubMed Central by 2.7 million articles from nearly 4000 journals. I don't see then, that it can be reasonably be argued that this practice in itself impedes either adoption or interchange.

tcatapano commented 11 years ago

Minor correction JATS=Journal Article Tag Suite

tcatapano commented 11 years ago

One last comment (for now): why not permit <genericNameEntry>? It would vastly simplify migration. For newly encoded finding aids it would be unlikely users would use it -- unless they are really unsure about to what sort of entity the name refers, or were just plain lazy. It could be made clear in documentation that it is primarily a "transitional" element to facilitate migration and will phased out in the next version. Why allow it for <controlAccess> but not <origination>?

MinnesotaFox commented 11 years ago

De gustibus non est disputandum.

One person's elegance is another's annoyance.

The argument for the consistency of encoding corporate, family and personal names in both and is simply this. There often is no semantic difference between the two. Shall we allow less precision in the representation of names in one place and not the other?

Consider the John and Martha Williams papers. Some repositories might choose to employ two separate elements, one for John Williams and one for Martha Williams. Others might encode John Williams in and Martha Williams in . The latter reflects the influence of bibliographic cataloging with its emphasis on a single Main Entry. I suspect that the latter is the more common practice in the U.S.

The same situation applies when the body of records being described, while continuous over time in its content, was created by different corporate agencies with varying names. This often occurs with government bodies.

An entry in , perhaps with the @role value of "origination", for an earlier form of name for the agency would have the same semantic meaning and the same significance for research as the entry for the most recent name for that agency recorded in . It seems to me that we should treat them the same; they have the same semantic meaning wherever and however they are encoded.

Or shall we have two options at every point?

Or two options only in ?

Or shall we just have two options only in , one for names that have the role of some sort of creation/origination and a separate encoding for those corporate, personal and family names that are represented as being the subject of the materials being described.

Michael

On Mon, Apr 22, 2013 at 1:34 PM, tcatapano notifications@github.com wrote:

I'm not opposed to this feature request. I understand the desire for it and respect the arguments in its favor. But I do have doubts about how it should be implemented and whether it should be in this version.

However, I do believe a proposal about degree of restriction may be discussed. Have we not implicitly done this for many elements? Nobody in this discussion is claiming that it would not be very difficult to implement a migration from an unstructured to a structured/restrictive . That is a fairly objective measure.

Also, while we do have a point of emphasis of "Achieving greater conceptual and semantic consistency in the use of EAD" ("Where prudent, the committee will eliminate alternative methods for encoding a given descriptive convention"), we also have one that states we are "Being mindful that a new version will affect current users".

Finally, I do not necessarily concede that it is a principle that standardization improves interchange and simplifies adoption and use, and the opposite may be (and has been) argued, but I'll do it some other time ;-) For the purposes of this discussion, I will concede it, but suggest that a making available a finely granular/structured and coarse/unstructured element pair is still providing standardization and not providing two ways to do the same thing, but two ways to do two different things. Moreover, offering fine/structured and coarse/unstructured elements is a practice employed by other widely adopted schema: TEI for example does so, as does the NLM/NCBI DTD, now the Journal Archiving Tag Suite (JATS), which is the DTD used in PubMed Central by 2.7 million articles from nearly 4000 journals. I don't see then, that it can be reasonably be argued that this practice in itself impedes either adoption or interchange.

— Reply to this email directly or view it on GitHubhttps://github.com/SAA-SDT/EAD-Revision/issues/126#issuecomment-16812278 .

Michael Fox

tcatapano commented 11 years ago

"Shall we allow less precision in the representation of names in one place and not the other?" -- The issue to me is should we require more in one than the other? Including <genericNameEntry> in <origination> would align the two and resolve most doubts (I at least) have. That still leaves the problem of allowing text alongside the entry elements, which I predict will be desirable to users, if for no other reason than people often prefer to supply punctuation in the instance. It's not for nothing that NLM-NCBI/JATS created the <x> element see http://jats.nlm.nih.gov/archiving/tag-library/1.0/n-nmm0.html

rockivist commented 11 years ago

Adding a note from Henny's email so that it is recorded on this thread:

From Henny (email, 4/18/13): "Within the origination element it is not possible to tag the dates of a person, family or corporate body. This should be possible."

rockivist commented 11 years ago

Additional notes from Henny (email, 4/21/13):

Michael, and all others,

The content model of <origination> is mixed: it may contain text and/or element "archRef", "corpName", "emph", "famName", "lb", "name", "persName", "quote", "ref" or "title"

These are the possibilities;

  1. no dates: ` Commission on the Bicentennial of the United States Constitution

    `

  2. dates outside <xxxName>: ` Commission on the Bicentennial of the United States Constitution, 1983-1992 `
  3. dates inside <xxxname>: ` Commission on the Bicentennial of the United States Constitution, 1983-1992

    `

But is is not possible to tag the dates as a date. If this is what we want, my local solution would be:

  1. dates outside <xxxName> and abuse the <emph> element: ` Commission on the Bicentennial of the United States Constitution, 1983-1992 ` or
  2. dates inside <xxxName> and abuse the <emph> element: ` Commission on the Bicentennial of the United States Constitution, 1983-1992

    `

Not nice, but effective within our local system.

Henny

rockivist commented 11 years ago

My recommendation re: content model for <origination> (email, 4/22/13):

My recommendations:

<origination> should only allow a choice of one of the following: <persNameEntry>, <corpNameEntry>, <famNameEntry>, <genericNameEntry> (replacing <name>), or <titleNameEntry>.

MicheleCombs commented 11 years ago

+1

rockivist commented 11 years ago

Closed. See #232 for final decision re: origination.