ebeshero / Amadis-in-Translation

a project to apply TEI markup to investigate early modern Spanish editions of Amadis de Gaula and their translations into English and French from the 1500s to the early nineteenth century.
http://amadis.newtfire.org
GNU Affero General Public License v3.0
4 stars 6 forks source link

TEI Decisions, ODD Dev: Rethinking @synch on anchor, and changing the milestone element #37

Closed ebeshero closed 7 years ago

ebeshero commented 8 years ago

@sydb @HelenaSabel @setriplette Syd Bauman has some advice for our project: We're using TEI in some unconventional ways, but the most egregious thing we really had better deal with is our use of the @synch attribute on our <anchor> elements. Syd informs us (and indeed the TEI Guidelines tell us) that @synch is supposed to be used for indications of chronological time. (I think I noticed this back in August when Stacey and I were sitting in a coffee shop figuring out how to code our "stitchery" alignments of Southey to Montalvo, but we just landed on @synch with the idea of this is where Text B "syncs up" with Text A. We can change this, but we have a decision to make:

1) Use TEI's @corresp instead. But as @sydb explained to me yesterday at the Lyon airport, @corresp was initially defined back in the 1990s for situations of exact mathematical correspondence. That's not what we're doing here. We're wanting to indicate something more like "this is where Text B maps to Text A.

2) Syd thinks we should constrain the TEI to our project by using the (relatively newish) ODD (one-document-does-it-all) schema, to introduce a new attribute that we could call @mapsTo or something like that, JUST for our project.

Our decision to make it this: We could just go with @corresp and explain our distinct usage of this attribute. OR we could try running with an ODD definition. I've not written an ODD before, as I've contented myself with using Schematron to add project-specific constraints to the TEI-ALL Relax-NG schema. On the other hand, as a newly elected member of the TEI Council, I had probably better familiarize myself with the ODD for constraining TEI...Thoughts on this? @HelenaSabel, have you worked with ODD at all?

setriplette commented 8 years ago

Hi Elisa and Helena,

I’m sure it’s a good idea to conform better to the TEI rules, but I’m not sure how to best do that. I’ll defer to you two!

The ODD sounds difficult—but maybe worth it for your Council experience? At least then you could give them feedback if they’re discussing it.

Hugs, S

— Stacey Triplette Assistant Professor of Spanish and French Humanities Division University of Pittsburgh at Greensburg Faculty Office Building 200 150 Finoli Drive Greensburg, PA 15601

On Nov 2, 2015, at 3:51 PM, Elisa Beshero-Bondar notifications@github.com<mailto:notifications@github.com> wrote:

@sydbhttps://github.com/sydb @HelenaSabelhttps://github.com/HelenaSabel @setriplettehttps://github.com/setriplette Syd Bauman has some advice for our project: We're using TEI in some unconventional ways, but the most egregious thing we really had better deal with is our use of the @synch attribute on our elements. Syd informs us (and indeed the TEI Guidelines tell us) that @synch is supposed to be used for indications of chronological time. (I think I noticed this back in August when Stacey and I were sitting in a coffee shop figuring out how to code our "stitchery" alignments of Southey to Montalvo, but we just landed on @synchhttps://github.com/synch with the idea of this is where Text B "syncs up" with Text A. We can change this, but we have a decision to make:

1) Use TEI's @corresp instead. But as @sydbhttps://github.com/sydb explained to me yesterday at the Lyon airport, @corresp was initially defined back in the 1990s for situations of exact mathematical correspondence. That's not what we're doing here. We're wanting to indicate something more like "this is where Text B maps to Text A.

2) Syd thinks we should constrain the TEI to our project by using the (relatively newish) ODD (one-document-does-it-all) schema, to introduce a new attribute that we could call @mapsTo or something like that, JUST for our project.

Our decision to make it this: We could just go with @corresp and explain our distinct usage of this attribute. OR we could try running with an ODD definition. I've not written an ODD before, as I've contented myself with using Schematron to add project-specific constraints to the TEI-ALL Relax-NG schema. On the other hand, as a newly elected member of the TEI Council, I had probably better familiarize myself with the ODD for constraining TEI...Thoughts on this? @HelenaSabelhttps://github.com/HelenaSabel, have you worked with ODD at all?

— Reply to this email directly or view it on GitHubhttps://github.com/ebeshero/Amadis-in-Translation/issues/37.

HelenaSabel commented 8 years ago

Hello! I build ODD schemas using Roma (which is a very, very intuitive tool). However, I combine them with Schematron anyway for rules such as "tokenize the values of this attribute on whitespace and then check that every token matches one of the values of this particular ancillary file" (so if you are going to introduce yourself in the ODD world, maybe you wouldn't mind explaining to me later how this particular rules could be build without using Schematron). The @corresp definition is very broad ("points to elements that correspond to the current element in some way") so I think it fits our purposes. I'd rather use @corresp than create a new attribute since its definition doesn't contain any of the restrictions for which it was originally intended.

sydb commented 8 years ago

Couple of quick clarifications.

Luckily, we have a collaborative writing environment right here in gitHub. So my recommendation is that:

  1. Helena make an initial ODD or two (one for Montalvo, one for Southey) using Roma-the-web-interface, and put the results here in the repository. (Alternatively, Helena, since y'all already have data, it would take me < 10 mins to make one using odd-by-example.)
  2. I will make a quick pass at tidying it up and incorporating your current Schematron rules into it.
  3. Elisa will make the change to add the new @mapTo attribute. We can do that together on Skype, if you prefer, Elisa.

Note: step 2 may take me a little while, here, as I am planning to use this as a test-case for fixes needed to the TEI Scematron-extraction-into-RNG stylesheet, and I don't work as fast as you guys.

Let me know what you think.

ebeshero commented 8 years ago

@sydb @HelenaSabel @setriplette I've used ROMA before, actually before I constrained the Mitford project with Schematron, but I wasn't using it to write ODDs back then. Syd, we've got a kickin' Schematron for Amadis already, right here in this repository, so I think one of things we should do is try to build an ODD around, or through, or on top of, (or whatever) the schematron we've been writing all along. Here's our Schematron: https://github.com/ebeshero/Amadis-in-Translation/blob/master/XML-and-Schematron/Amadis.sch

Syd, should we really just start here and should I and/or Helena try to build an ODD from this?

sydb commented 8 years ago

I don’t think so. Not really sure, as I’ve never tried to build an ODD from Schematron, but my instinct is you do the reverse. Make the ODD first, then tuck the Schematron in. (Which is a bit ass-backwards, of course, because then you process the ODD in order to pull the Schematron back out — but you also get RELAX NG and customized coumentation out.)

ebeshero commented 8 years ago

Well, I am game to try! We wrote the Schematron to constrain Southey and Montalvo files all together--a sort of One Schematron Does it All approach, and so my sense is we could make the ODD the same way--to rule all our encoding for the Amadis project. If our project can help Syd as a test-case, we should do this! I will in any case go and look at the TEI Roma tomorrow to see how I'd want to set it up and see how far I get with making a first ODD: @HelenaSabel Let me give this a try first, and if I get lost, I'll ping you, okay?

HelenaSabel commented 8 years ago

Of course! And you can use Roma to add the @mapsTo attribute to <anchor> . At least in my case, the step of adding and removing elements makes me reflect a lot on my mark-up decisions and I end up doing a second document analysis. I bet you'll find some enhancements to apply.

ebeshero commented 8 years ago

After our discussion of https://github.com/ebeshero/Amadis-in-Translation/commit/0e1c146f99cc595e0f177d145b0c5b67e6958a45 and in Issue #38, I'm thinking with @HelenaSabel that @corresp is probably just fine for our purposes in indicating where a passage in Southey aligns with one or more passages in Montalvo. We don't really need a new attribute here, do we? Whatever the meaning of @corresp was back when it was introduced, it now means, literally: "points to elements that correspond to the current element in some way." (see the Guidelines on global linking attributes). I guess I don't see a compelling reason to introduce a new attribute outside the TEI for this.

ebeshero commented 8 years ago

@HelenaSabel @setriplette As we're thinking about the ODD and What We Might Change, I'm not convinced we need something different from @corresp (a standard TEI attribute) for our anchor elements in Southey to point to corresponding passages in Montalvo. However, a more compelling reflection is our designation of self-closing elements themselves. We use two of them right now:

I think <anchor/> makes sense for "tethering" Southey to Montalvo. I don't think there's any compelling reason to change this.

But maybe we should rethink our use of <milestone/>. We're using it to designate speech acts in the text, and we add attributes to indicate the start and end and internal interruptions to speeches, as well as which character is speaking, and whether we're sure or not who is speaking. We chose <milestone/> for this just because it's a self-closing element and it's NOT an anchor, and that's maybe not the best reason to use this element, if we can work with something more intuitive.

With an ODD, I wonder if we can adapt a TEI element that's supposed to indicate speaking: <sp>. We use this in coding plays, and it can also be used in prose texts--but it's not treated as a self-closing element the way we would need to use it. I wonder if it makes sense to change our <milestone/> elements into a weird new and much simplified construct of: <sp who='#id' ana="(start, intStart, intEnd, end)" cert="(high, medium, low, unknown)"/> ?

Syd, can we do that with an ODD--change an element's rules to make it possible to write <sp> as I describe here?

The use of @cert here would replace our problematic use of @type="unclear" which @sydb commented on in our schema. We currently use @type in one way only when we use it at all on a milestone element, and that's only to indicate that it's not totally clear to us that the person we're designating as a speaker is actually the one speaking, so we've made an educated guess here. As Syd rightly points out, that's not a reasonable use of the @type attribute, which is supposed to be for indicating classifications of something (since we'd expect an @type to have multiple values, not just one). Really what we're indicating is a degree of certainty, and that's what @cert is supposed to be used for. See the explanation of the Attributes for Global Responsibility.

HelenaSabel commented 8 years ago

I think changing the <milestone/> elements for a not self-closing one would create a problematic hierarchy in Montalvo, since we are using the punctuation as a delimiter for the <cl> elements. See for example the following samples from Montalvo's first chapter.

  <cl xml:id="M0_p1_c33">
    <milestone unit="said" resp="#Garinter" ana="start"/>No os maravilléis de
    esso cavallero que assí como en las otras tierras ay buenos cavalleros y
  malos/</cl>
  <cl xml:id="M0_p1_c34">assí los ay en esta/</cl>
  <cl xml:id="M0_p1_c35">y estos que dezís no solamente a muchos han fecho grandes
  males y desaguisados:</cl>
  <cl xml:id="M0_p1_c36">mas aun al mismo rey su señor sin que de ellos justicia
  hazer pudiesse por ser muy emparentados han hecho enormes agravios:</cl>
  <cl xml:id="M0_p1_c37">y también por esta montaña tan espessa donde se
  acogían.<milestone unit="said" ana="end"/>
  <cl xml:id="M0_p1_c88">
    <milestone unit="said" resp="#Perion" ana="start"/>Ay señora<milestone
    unit="said" ana="intStart"/> dixo él<milestone unit="said" ana="intEnd"
    />:</cl>
    <cl xml:id="M0_p1_c89">no será el postrimero:</cl>
    <cl xml:id="M0_p1_c90">mas todo el tiempo de mi vida será empleado en vos
    servir.<milestone unit="said" ana="end"/>
    </cl>

In the first one, the <sp> element would be the parent of all those clauses, but in the next one some of the <sp> would be children of <cl>.

I agree with adding the @cert attributes inside the <milestone/>.

ebeshero commented 8 years ago

@HelenaSabel That's not quite what I meant. Let me explain: We'd have to change the way the TEI <sp> element behaves in order to make it fit our project, and I imagine we'd have to do that within the ODD (if that's what ODD lets us do!) In order for this to work, we would have to convert <sp> into a self-closing element in this form:

<sp who='#id' ana="(start, intStart, intEnd, end)" cert="(high, medium, low, unknown)"/>

So really, that's exactly the same element structure as the <milestone/> element. I don't know if the ODD lets us change an existing TEI element in this way, or if it only permits us to create new elements and attributes. If it doesn't let us alter an existing TEI element, then maybe we could invent something like:

<speech who='#id' ana="(start, intStart, intEnd, end)" cert="(high, medium, low, unknown)"/> to introduce an element that doesn't currently exist in TEI.

I'm about to learn just how far you can push the rules with an ODD, and I'm not sure what I imagine will be allowed. But if it isn't, perhaps we can invent a new element with a clearer name to designate what we're marking. Does that make sense? @sydb @setriplette

HelenaSabel commented 8 years ago

Sorry! I misread your example! Now I see what you meant (and yes, it makes lots of sense). If you aren't able to make those changes using Roma, you can always edit the Relax NG file afterwards and document it (and then you might want to upload it again to Roma so you can use the "sanity checker"). So the question is: is that change worth it considering that the Guidelines mention acts of speech as one of the uses you might want to give to <milestone/> elements? On the one hand, we would avoid the required @unit attribute of the <milestone/> by using <sp/>. On the other hand, it seems inefficient to modify an element when there is another one which is expected to be used for those cases... I think I might be a very "moderate" person when dealing with standard schemas, and that's why I preferred to use @corresp instead of @mapsTo, and now I vote to keep using <milestone/>. In any case, your proposal makes perfect sense to me and I encourage you to try new things ;-)

ebeshero commented 8 years ago

So, is the ODD basically identical to the Relax-NG schema that Roma makes? Or is it a DTD file? Or what? I used to use Roma to feed in a series of attribute values permitted on various TEI attributes before I realized I could do that a WHOLE lot faster by writing my own Schematron, and now I'm just feeling like the whole Roma process is really slow! We have a perfectly good Schematron, so why is it that we need to write an ODD this way? Why would I want to rewrite our rules by filling out a web form? @sydb Sorry about the grumpiness factor of this message, but really, using the Roma web interface feels like taking a step backwards for this project. Is there a better way?

ebeshero commented 8 years ago

Okay, I'm done with grumping, and I see that yes, an ODD is simply a customization of the TEI produced by the Roma. So I'm making one that will constrain the attribute values, and I guess I'll try to lighten up the <milestone/> element by removing its @unit, since for our project it's really only ever going to be used for speeches. (We decided a while back that we didn't need it for the Southey paragraphs.) I guess we'll just make this by limiting which attributes we're permitting. I'm not really sure what we're going to do with our existing Schematron rules, but presumably we'll find some way to work them in.

HelenaSabel commented 8 years ago

Indeed most of the Schematron rules can be replaced by the schema customization using Roma (except, maybe, the rules that point to other files: at least those I would only know how to write in Schematron, so, if you find another way, let me know!). The benefit about using Roma is the documentation it generates about your customized schema (very handy to publish your editorial criteria!)

ebeshero commented 8 years ago

Actually, no--most of the Schematron rules cannot be replaced by Roma, at least not through the web form. Most of our rules are about indicating what attributes are used in combination with other attributes and I don't see a way of writing that in the Roma form, at least not yet. And the output in Relax-NG isn't valid in oXygen--there's something wrong, probably to do with how I've tried to alter an element, <said>, but the error it's generating doesn't make sense to me. I'll save the various files I've generated and post them, but I don't like how long it takes to sort through elements and attribute classes here. Writing Schematron still seems a lot easier to me, and we still need to apply Schematron anyway.

ebeshero commented 8 years ago

Okay--I've got a Relax-NG version of the ODD that is valid in oXygen now by tweaking what I was doing with the <said> element. I used that as my test-case for altering the usage of a TEI element, and I turned <said> into a milestone class element so it self-closes, and constrained the attributes so they fit our project's use of @ana. I'm not happy with the evidently invalid "documentation" XML file that Roma generated when I asked for a TEI ODD, but I'm probably doing something wrong in generating it. Sigh.

ebeshero commented 8 years ago

@HelenaSabel @sydb I've posted some ODD files now in the XML-and-Schematron directory: a Relax-NG schema, and HTML and XML versions of the documentation (which seem very different). The HTML documentation looks user-friendly (and I'm sure I've generated one of these before just to see what it does, though I haven't used it in a project). I'll bet the HTML documentation that Roma generates is what people like about the ODD. But I notice that it will need to be changed, and it doesn't include the nonstandard attribute that I required on the <said> element. The Roma ODD generator for HTML does a nice job of eliminating the stuff you excluded, but apparently you need to change things in it whenever you make an addition or alteration to the way things usually work in TEI. And this is probably Known Territory to those who work with the ODD. I suppose I'll next have to modify the XML or HTML ODD documentation to make it correct and supply project-specific examples.

ebeshero commented 8 years ago

@sydb Major Question: How do I incorporate my schematron rules into the ODD? Here is what I think:

Having posted some surely rudimentary ODD files, I shall await more explanation.

ebeshero commented 8 years ago

@HelenaSabel @setriplette Stacey and Helena: Just a quick note on all the hubbub over our TEI encoding to say: 1) For now, our project Schematron is up-to-date with with our coding as we've been doing it so far. 2) The changes we're discussing on this thread haven't been integrated yet into our project, and the "ODD" files I've generated are just an experiment with modifying elements and attribute values, but haven't actually been implemented yet. I want to wait until @sydb can tell me more about how Schematron integrates with ODD, and I sense he's busy a while, so we can wait until he has time for us again.

When we do have the ODD worked out, and have a good system for working with it in place, we'll implement some small but significant modifications to our code, and I'll do a bulk identity transformation over our project files to change them all at once, so that:

HelenaSabel commented 8 years ago

Hi Elisa! Sounds perfect! Meanwhile, I still need to apply some enhancements to my FS stylesheet to deal with the notes with translated materials. Since you are working on the schema, I have a task for you for which I'll be opening now a new issue ;-)

sydb commented 8 years ago

Sorry I’ve been away so long, but it’s probably a good thing that you know you can’t rely on me to pay attention to your conversations and chime in in a timely fashion. If you want my input, best to send me e-mail or give me a phone call.

Anyway, @HelenaSabel has suggested absolute blasphemy:

you can always edit the Relax NG file afterwards and document it

No, no, no, NO!

I am the first to say that there is no particularly strong reason a project should try to remain TEI Conformant, especially in their local work. (As opposed to the canonical files they publish or archive; but even there, TEI Conformance is important only in so far as it is useful.) HOWEVER, the TEI rule about not changing derivatives but only changing the source (in this case the source ODD) comes not from a vague notion of what would be a good encoding that may or may not fit your project. It comes from years of CS experience and hard and fast TEI experience that demonstrates that such modifications are doomed to make things much more complex, quite likely impossibly complex, down the road.

So while I encourage everyone to read, use, and play with RELAX NG as a schema language (it is a far better schema language than W3C Schema or DTDs, and perhaps even better than the TEI’s own schema language called “PureODD” and not really published yet), just as it’s a bad idea to “patch” compiled code, it is a bad idea to modify the schemas derived from ODD.


So, is the ODD basically identical to the Relax-NG schema that Roma makes?

No, the ODD is the source from which the RELAX NG (or W3C Schema or DTD) closed schema, the Schematron open schema, AND the customized documentation (in HTML, PDF, or others) is derived. If you use Roma-the-web-interface the ODD file is what you get when you click on Save customizatoin, which is right next to Sanity checker.


Roma web interface feels like taking a step backwards for this project. Is there a better way?

Yeah, just write the ODD by hand. I almost never use Roma-the-web-interface myself.


We have a perfectly good Schematron, so why is it that we need to write an ODD this way? Why would I want to rewrite our rules by filling out a web form?

Good question. First, to be clear (as stated above), I am not advocating Roma as a way to write you ODD, just that you have an ODD. Why? Two reasons. The main one is for documentation and clarity. ODD provides a vastly superior mechanism for documenting your encoding than just writing your own RELAX NG or Schematron does. The second is just practical: it’s a whole lot easier (and faster) to validate a document against a RELAX NG schema than a Schematron schema. So much so that oXygen can do so in advance. If you constraint the values of your @type attribute to start, intStart, intEnd, and end in your Schematron schema, a user who enters a @type attribute will have to type in a value from scratch. If she types in an incorrect value, oXygen will flag the attr as an error quite quickly for a small file (large files take longer). If you write the same constraint the way you’re supposed to using a <valList> in your ODD, oXygen will give the user a drop-down list of only those four values. She would have to work to make a mistake, and if she does it will get flagged nearly instantly.


I’m jumping ahead to those posts that were flagged for me. If I keep on looking at every post, I’ll be at this all day before I send you anything. (That said, I am highly suspicious of this use of <milestone>, which is only intended for navigational features that tesselate the <div>, if not the <text>.)

OK, so I just looked in the XML-and-Schematron/ folder. First, let’s work on file naming. Nothing really wrong with what you have, but probably makes more sense to consider .odd a type of file, rather than a name. Thus:

I’m not touching anything in the master branch, only in the sydb-constraints01 branch, but I’m going to rename files as above there. More importantly I have:

1.

2.

I then generated outputs from the new ODD[1], deleted the old ODD and its outputs, and checked it all in (in several commits), and pushed it all up. Will issue pull request shortly.

Note [1], in the interests of full disclosure: And then I cheated. I did exactly what I said one should never do, above. I modified the Amadis.isosch file by hand. There is a tiny bug in the program that extracts ISO Schematron from ODD. In some cases it generates duplicate namespace specification elments (i.e., <sch:ns> elements). Most processors don’t care (after all, there’s nothing wrong with it). But one or two processors out there (incorrectly, IMHO) choke on it. So I deleted one such duplicate by hand, and added a comment that I did so.

ebeshero commented 8 years ago

@sydb Wow--this going to take me a while to process--and that's exactly what I hoped. :+1: Thank you. So first of all, we are very much copy leftists here, with CC-BY-SA-4.0, as advertised on our website at http://amadis.newtfire.org, but probably we'd better say that on this GitHub repository too. And second of all, I need to learn to write an ODD myself outside Roma. I imagined I'd have to begin with invoking the TEI All somehow and following the customization reductions in step one of Roma, but I expect there's some good way to do this and maybe I'll pick up on it as I look through your modified files...which I'll do probably in the next day. Thanks for enlightening me about ODD!

ebeshero commented 8 years ago

@sydb @HelenaSabel Aha! I'm reviewing and editing Syd's new ODD file, and I think I'm getting how to edit this, and how to incorporate our Schematron rules (the ones that really need to be written as Schematron that point to other files or constrain how we use our attribute values in sequence). Here's a small example from the ODD, Helena, just to give you an idea.

<elementSpec ident="anchor" module="linking" mode="change">
      <constraintSpec scheme="isoschematron" ident="refer-to-Montalvo">
        <constraint>
          <sch:rule context="tei:anchor[@ana eq 'start'][not(@type eq 'add')]">
        <sch:report test="not(@corresp)">An @corresp must be present on this anchor to point to a corresponding unit in Montalvo!</sch:report>
          </sch:rule>
        </constraint>
      </constraintSpec>    

I'm going to see how this works and see if I can incorporate more of our Schematron rules (to point to our site index for our xml:ids, for example).