SAA-SDT / EAD3

https://www.loc.gov/ead/index.html
Creative Commons Zero v1.0 Universal
84 stars 25 forks source link

Give <part> m.mixed.basic #366

Closed rockivist closed 9 years ago

rockivist commented 10 years ago

As it stands now, <part> accepts only text. This will prove problematic for migration of those elements that now consist of one or more <part>s.

My suggestion: Give <part> the m.mixed.basic mixed content model. This will alleviate most of the migration issues.

kerstarno-zz commented 10 years ago

I am not sure, I would go for the full m.mixed.basic (though we haven't anything less basic at the moment). All m.access elements that would get a part as subelement in EAD3 only have the m.phrase.bare content model in EAD2002 right now, that is emph, lb, ptr and extptr (with the last not being included anymore in EAD3). That also goes for subarea as subelement of corpname in EAD2002.

So, I am wondering if we'd really would need abbr, expan and especially ref (which allows for all different kinds of follow-ups with all its subelements) for part (i.e. if we have a use case for this). The only cases, where we'd have a migration issue with these elements would be when transforming a current origination or a current repository to origination/name/part respectively repository/corpname/part as origination and repository currently allow for a whole lot more subelements.

However, I'd personally tend to recommend converting an EAD2002 origination or repository that makes use of the various subelements to origination/descriptivenote/p or repository/descriptivenote/p rather than trying to make it fit to *name/part.

MicheleCombs commented 10 years ago

@kerstarno : I agree

MicheleCombs commented 10 years ago

@rockivist : Can you give an example?

keikoro commented 10 years ago

@MicheleCombs Looks like you meant to address a different Kerstin. (; /drivebycomment

rockivist commented 10 years ago

@kerstarno I think you make a good point - giving <part> the full m.mixed.basic would be too permissive, complicating things more than they already are.

In EAD 2002 the various children of controlaccess may contain text, emph, extptr, lb, and ptr, except for title, which may also contain date and num. If we create a new mixed content model in EAD3 just for part, it seems unobtrusive to include emph, lb, and ptr. Num is pretty useless - I'd omit it - but what about date? Include it, making it available in all of the controlaccess elements? Or lose both date and num?

kerstarno-zz commented 10 years ago

Even though we don't use it in the ArchivesPortalEurope, I'd assume, that there are use cases out there, where one would want to specify a date related to a title. So, yes, I'd tend to agree, that it would be good to keep that for migration aspects.

I also could imagine use cases, where one would like to include a date e.g. with the part of a person's name or the name of a corporate body. E.g. if I'd have the name of a certain "Dr. Peter Smith" I might want to put a date on the part for the academic title indicating when that has been achieved. Or if I'd have the name of the "New Year's LLC" that previously has been the "New Year's Inc." I might want to indicate when the one and when the other legal status has been active. Having date as subelement of part for the controlaccess elements would then be kind of a substitute/equivalent of the in EAC-CPF ;-)

As for num: I don't really see the point in it, but perhaps others can provide a use case?

tcatapano commented 10 years ago

Wouldn't one be able to use a <part> for a date portion of a controlaccess element?

...<part localtype="date">1892</part>...

tcatapano commented 10 years ago

@rockivist, I dont understand this:

"This will prove problematic for migration of those elements that now consist of one or more <part>s.

My suggestion: Give <part> the m.mixed.basic mixed content model. This will alleviate most of the migration issues."

What elements "now consist of one or more <part>s"? It's a new element.

Also, how is it making migration any easier by using m.mixed.basic? Of the elements, emph, extptr, lb, and ptr, only emph contains text, so all that is lost from conversion of the mixed content of a controlaccess element to text is the emph encoding. I'm not opposing the suggestion, but I don't see ease of migration as an issue.

rockivist commented 10 years ago

@tcatapano The entire set of controlaccess elements (corpname, famname, name, persname, subject, geogname, genreform, occupation, function, and title) now exclusively contain one or more part elements, which themselves may only contain text.

Migration from EAD 2002 to EAD3 as it currently stands will require that we strip emph, lb, and ptr from all of them, as well as date and num from title.

There are a few questions to answer:

1) Is that loss (or any part of it) acceptable? 2) Are there any other mixed content elements in EAD3 that we would want to see in part?

Personally I don't think losing all of that encoding is acceptable. (I could live with losing num). And for my part I'd like to see <foreign> added.

Excluding m.mixed.basic.plus.access (which would introduce all of the controlaccess elements themselves as children of part), there are two mixed content models we could draw from:

m.mixed.basic (text, abbr, expan, emph, lb, ref, ptr, foreign) m.mixed.basic.plus (text, abbr, expan, emph, lb, ref, ptr, foreign, quote, num, footnote, date)

As I think about this more, I am leaning toward giving <part> the m.mixed.basic.plus mixed content model. It will eliminate my migration concerns and it will introduce, foreign, which I think is important for multi-language purposes. My only reservations would be the inclusion of ref, quote, and footnote, and I think I could live with that.

I'm curious what @billstockting thinks. Bill, can you chime in?

tcatapano commented 10 years ago

Im just surprised we're still going around on this issue at this point. Wasnt everybody clear that part would contain only text? On Jan 7, 2014 10:18 PM, "Michael Rush" notifications@github.com wrote:

@tcatapano https://github.com/tcatapano The entire set of controlaccess elements (corpname, famname, name, persname, subject, geogname, genreform, occupation, function, and title) now exclusively contain one or more part elements, which themselves may only contain text.

Migration from EAD 2002 to EAD3 as it currently stands will require that we strip emph, lb, and ptr from all of them, as well as date and num from title.

There are a few questions to answer:

1) Is that loss (or any part of it) acceptable? 2) Are there any other mixed content elements in EAD3 that we would want to see in part?

Personally I don't think losing all of that encoding is acceptable. (I could live with losing num). And for my part I'd like to see added.

Excluding m.mixed.basic.plus.access (which would introduce all of the controlaccess elements themselves as children of part), there are two mixed content models we could draw from:

m.mixed.basic (text, abbr, expan, emph, lb, ref, ptr, foreign) m.mixed.basic.plus (text, abbr, expan, emph, lb, ref, ptr, foreign, quote, num, footnote, date)

As I think about this more, I am leaning toward giving the m.mixed.basic.plus mixed content model. It will eliminate my migration concerns and it will introduce, foreign, which I think is important for multi-language purposes. My only reservations would be the inclusion of ref, quote, and footnote, and I think I could live with that.

I'm curious what @billstockting https://github.com/billstocktingthings. Bill, can you chime in?

— Reply to this email directly or view it on GitHubhttps://github.com/SAA-SDT/EAD-Revision/issues/366#issuecomment-31802754 .

tcatapano commented 10 years ago

@rockivist FWIW, here's what the usage is for the access elements:

corpname: emph, lb, subarea persname: emph, lb, p famname: emph, lb function: lb genreform: emph, genreform, lb, title name: emph, lb, ptr occupation: emph, lb subject: emph, geogname, lb, ptr, title title: date, em, emph, lb, num, physdesc, title, unitdate

My inclination is to allow m.mixed.basic in <part>, mainly to allow linking through ref and ptr, and specifications for rendition by emph. Otherwise, I'd recommend using a <part> for nested date, title, etc...

I dont think it is a good idea to allow quote, num, and footnote in <part>

rockivist commented 10 years ago

@tcatapano Perhaps we were clear on part containing only text - it didn't make sense when I got to it during my element by element schema review.

I agree that it should have m.mixed.basic. I'm fine with losing num and date and agree that quote, num, and footnote would be bad ideas.

billstockting commented 10 years ago

@rockivist and @tcatapano My instinct is that we don't need mixed content here. Controlled access elements are over on the data side rather than the text and I see, therefore, no need for abbreviations, emphasis or breaks. I'm probably being dim but can you remind me of the case for ptr and ref here as well as foreign?

tcatapano commented 10 years ago

@billstockting:

Here's a case for <ref>/<ptr>: If one wants to link from any part, they must do so within the part because ref does not contain part

<abbr>:

<subject encodinganalog="610"> <part encodinganalog="610$a">``<abbr expan="National Association for the Advancement of Colored People">N.A.A.C.P</abbr>``</part> <part encodinganalog="610$b">Newark Branch</part> </subject>

<expan>:

<subject encodinganalog="610"> <part encodinganalog="610$a">``<expan abbr="N.A.A.C.P">National Association for the Advancement of Colored People</expan>``</part> <part encodinganalog="610$b">Newark Branch</part> </subject>

<emph>:

<unittitle> <persname>``<part>Siegal</part>``<part>Benjamin <emph render="doublequote">Busgy</emph></part>``<persname> </unittitle>

<foreign>:

<unittitle>Notes for <title>``<part>Whither <foreign lang="Latn" render="italic">Quo Vadis</foreign>?</part>``<part>Sienkiewicz's novel in film and television</part>``</title>``</unittitle>

kerstarno-zz commented 10 years ago

@tcatapano : Terry, could you please help me out on where your usage list originates from? Is that, what's possible in the DTD? Because I can't seem to find it in the schema?

Based on what I see in the schema as currently being possible for the m.access elements (lb, emph, ptr, exptr for all of them, plus subarea for corpname (which would become part in EAD3) as well as date and num mentioned by Mike), I would think that we unnecessarily overcomplicate things when allowing even more subelements in part.

My proposal still would be a newly introduced m.basic.access model with just lb, emph, ptr and date for migration reasons. That is, if we'd really intend to change the part element from just text to mixed content.

tcatapano commented 10 years ago

@kerstarno could you clarify what is meant by "usage list"? I don't understand.

MicheleCombs commented 10 years ago

I thought we discussed this way back when, and decided that they could link from the parent element – subject or persname or whatever.

Michele

From: tcatapano [mailto:notifications@github.com] Sent: Thursday, January 09, 2014 12:51 PM To: SAA-SDT/EAD-Revision Cc: Michele R Combs Subject: Re: [EAD-Revision] Give m.mixed.basic (#366)

@billstocktinghttps://github.com/billstockting:

Here's a case for /: If one wants to link from any part, they must do so within the part because ref does not contain part

kerstarno-zz commented 10 years ago

@tcatapano : I am referring to your comment above as an answer to Mike that starts with "FWIW, here's what the usage is for the access elements:"

Seems this list is slightly different from what Mike and I have been mentioning.

tcatapano commented 10 years ago

@MicheleCombs: I was addressing the case for linking from a part. Not the entire parent element.

MicheleCombs commented 10 years ago

I can’t think why would anyone want to do that. If the name is John Smith, why would you ever want to link from just “John” ? Can you come up with a use case?

tcatapano commented 10 years ago

@kerstarno

That is from a query of the Archivegrid corpus, which is all EAD 2002 or per EAD 2002, though with some obviously invalid usages.

kerstarno-zz commented 10 years ago

@tcatapano: Ah, ok. Thanks for the clarification.

billstockting commented 10 years ago

@tcatapano, @rockivist:

On the linking: I'm having the same trouble as @MicheleCombs in seeing the need to link to a part and thought we had considered that when agreeing that links would be made from the parent tag.

The formatting examples illustrate my point really. We agreed in principle to minimize mixed content in part to make the data in an instance easier to deal with and I think it unnecessary and unhelpful to encourage the adding of this sort of markup to controlaccess terms. So part of the issue here is the tension of using the same tags for names etc in controlaccess and in narrative contexts (within paragraphs) but my agreement with the decision to continue to have one set of tags was based on them of only allowing text within a single or multiple parts. If we allow m.mixed.basic we will in fact be extending questionable practice here by adding abbr, expan and foreign. So I'm still not convinced we need mixed content here. If we do I think we need to reconsider whether we need to reintroduce the structured tag and only allow mixed content on the narrative tag within

.

tcatapano commented 10 years ago

@billstockting

My primary concern is that we finally have a decision. Nevertheless, I fail to see the major negative consequences if we allow the basic mixed content elements of abbr/expan/foreign/emph/ref/ptr/lb here. It is the user's choice whether to employ them, and if present, they can be easily ignored in processing. It's not that the elements are absurd in this context (except for maybe lb, whose potential for absurdity extends beyond this context), and indeed have been a feature in EAD for years. What is the potential harm to be avoided? I fear the committee is making more of an aesthetic decision here.

On Fri, Jan 10, 2014 at 5:34 AM, billstockting notifications@github.comwrote:

@tcatapano https://github.com/tcatapano, @rockivisthttps://github.com/rockivist:

On the linking: I'm having the same trouble as @MicheleCombshttps://github.com/MicheleCombsin seeing the need to link to a part and thought we had considered that when agreeing that links would be made from the parent tag.

The formatting examples illustrate my point really. We agreed in principle to minimize mixed content in part to make the data in an instance easier to deal with and I think it unnecessary and unhelpful to encourage the adding of this sort of markup to controlaccess terms. So part of the issue here is the tension of using the same tags for names etc in controlaccess and in narrative contexts (within paragraphs) but my agreement with the decision to continue to have one set of tags was based on them of only allowing text within a single or multiple parts. If we allow m.mixed.basic we will in fact be extending questionable practice here by adding abbr, expan and foreign. So I'm still not convinced we need mixed content here. If we do I think we need to reconsider whether we need to reintroduce the structured tag and only allow mixed content on the narrative tag within

.

— Reply to this email directly or view it on GitHubhttps://github.com/SAA-SDT/EAD-Revision/issues/366#issuecomment-32017136 .

rockivist commented 10 years ago

Per TS-EAD 2014-01-16 conference call: implement as recommended.