Open djbpitt opened 2 years ago
I would suggest
<supportDesc material="Mixed">
<support>
<material>Paper is of low quality</material>
<material>Thin parchment. There is almost no distinction between the flesh and hair side ...</material>
...
</support>
</supportDesc>
@djbpitt
@atoboy I've reopened the issue just to clarify two details:
@material
, as you propose, doesn’t tell us what the mixture is. In most cases it will be parchment and paper, but could it be anything else? If we think of the @material
attribute as primarily for searching purposes, might it be more informative to use a token list, along the lines of "parchment paper"? If we do that, we can search for supportDesc[contains-token(@material, 'paper')]
to find all @material
values that include "paper", that is, it will find both "paper" by itself and "paper" when it is mixed with something else. If we specify "mixed", we would need supportDesc[@material = ('mixed', 'paper')]
, and that works only if "mixed" always includes paper as one of its implicit values. Either approach will work, but a token list is more informative because it names the components, while saying only "mixed" makes them only implicit.material="Paper"
earlier, but I think we should standardize on lower-case for single words, as we’ve done elsewhere. That is, instead of "Parchment" we would write "parchment", and similarly for all possible values of @material
.Please let me know what you think.
@djbpitt
@material
in order to allow more than one word for attribute value? Now the definition of this attribute in TEI Guidelines is:
attribute material { "paper" | "parch" | "mixed" | teidata.enumerated}?
However, I am not sure what type of TEI datatype we should use in this case: TEI Guidelines -- Appendix E Datatypes and Other Macros
@atoboy teidata.enumerated
requires a single word (https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-teidata.enumerated.html), and I'm suggesting that we want to require one or more items from a fixed list. The content model might look like:
attribute material {
list { ( "paper" | "parchment" | "wax" | "stone" | "cloth" )+ }
}
I don't mean to suggest that we should include wax or stone or cloth; the point is that we should include all of the materials we actually find or can reasonably expect to find. The list
structure in Relax NG allows a token list, so it will allow any combination of those items.
Unfortunately, the list
structure also allows repetition, so it would allow a value like:
<supportDesc material="paper paper parchment"> … </supportDesc>
This type of repetition would be a mistake, so we would want to prevent it, and it isn’t possible to do that with Relax NG alone. When I use this sort of schema in other projects, I add a Schematron rule to prevent repetition. With respect to @material
, if we agree to use a Relax NG list and if you can update the ODD and the XML documents, I can add the Schematron rule.
You may already know that it is possible to integrate Schematron rules into an ODD, instead of using a separate Schematron schema. I use a separate Schematron schema in my other work for two reasons:
In favor of integrating the Schematron into the ODD, though, is that I think it would integrate that aspect of the documentation, since the documentation is created from the ODD.
So: If you agree with the proposal that we use a list here, we should use Schematron to prevent repetition, and we should decide whether to do that in a separate Schematron schema (we already have one, so I would just add another rule to that) or whether we should integrate the Schematron rule into the ODD.
I would recommend that we defer that decision by continuing to use a separate Schematron file for now. Once we're satisfied with our changes to the ODD, we can then revisit the question of whether to integrate our Schematron validation into the ODD.
and there is no further information in
<material>
, than Paper or Parchment I suggest to remove the element material entirely.Or to remove
material="paper"
and to haveThis information should be unified somehow, but to repeat one and the same data makes no sense.
If, however we have
then we should write some more information, of course. The same is valid if we would like to have some more data about the paper or parchment. Then we will need:
This situation is more complicated because there are several possible variants, and I agree that the most important thing is for us to be consistent. Your examples above are of three types:
We might want to approach this question by asking how we want to use the values. Here is a proposal (for discussion; I don't mean to suggest that it is necessarily what we should do):
@material
attribute on the<supportDesc>
element is for structured search and retrieval. For that reason, it's a token list drawn from a fixed inventory of strings: "paper", "parchment", and whatever else might actually occur (stone? wax? birchbark?). Because it is a token list, we would not use a value like "mixed"; if a manuscript includes both parchment and paper, we would write<supportDesc material="parchment paper">
. The order of the values in a token list is not informational, so "parchment paper" and "paper parchment" are equivalent. The attribute is required and the value must include at least one token from the allowed list.A: I don't quite understand what is wrong with "mixed", but anyway I tried to write
<supportDesc material="parchment paper">
, but it triggers immediately an error. According to TEI Schema:attribute material { "paper" | "parch" | "mixed" | [teidata.enumerated](https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-teidata.enumerated.html) }?
,https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-supportDesc.html
Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.
That means that only one word here is allowed. So, if we would like to use an attribute here which is different from paper or parch, we can use just mixed. The other way around is to change the schema for supportDesc? Should we do it?
<material>
child of<support>
is for human eyes, that is, it's what we render in the codicological description. It is optional, but whether we use it is governed by the following considerations:a) Where there is a single material and no supplementary information (e.g., paper), we omit the
<material>
element. In the codicological description we'll upper-case the value of the attribute. This lets us avoid the duplication. We can validate this with Schematron: if the count of tokens in@material
is not equal to1
, there must be at least one<material>
element.b) Whether there are multiple materials, the
@material
attribute contains more than one token and a separate<material>
element is required for each type of support. If, for example, the manuscript is on a combination of parchment and paper, there will be two tokens in the@material
attribute value and at least two<material>
elements. There might be more than two if, for example, there are multiple types of paper.c) Even when there is one type of material, the
<material>
element can be used if the description presented to humans should be more detailed than what the attribute allows. For example, the@material
value might be just "paper", while the content of the<material>
element would read something like "Paper of poor quality".A: David, you describe very good the possible situations. So I will suggest:
a) You know that material is paper or parchment, but you have no further information, then encode this as:
or
With upper case letter.
b) You have a mixture of paper and parchment. Here we should decide whether we will change the model of supportDesc allowing both words as value of elements (it is repeatable).
material="Parchment Paper"
(upper case), or we will stick with the value "mixed". (If we decide to change supportDesc I don't know what kind of attribute class should be this allowing us to have two words as attribute value). What do you think? Then, as you suggested we will have twoc) You have some more information about paper, parchment, etc. Encode this as:
So, in principle we should decide whether we would like to change the model for supportDesc or leave it as it is?
1) Current possibility:
2) Changing
@material
:If we would like to retain both views: description of MS as database and description of MS as user perspective (reading as text), maybe the second one is better. What do you think?