TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
279 stars 88 forks source link

Add location information to the new `<conversion>` element #1910

Closed tillgrallert closed 3 years ago

tillgrallert commented 5 years ago

I am thrilled by the new <unitDecl> and its children introduced with version 3.6.0 of the guidelines (the discussion around this element can be found here). I am working extensively with mark-up of food prices in sources from the late Ottoman mediterranean in various languages (Arabic, Ottoman, French, English, and German). What strikes me as missing from this wonderful new system of declaring non-SI units, is the possibility to add location information to <conversion>s. Unit names for volumes, weights, etc. were largely the same across the geographic space of interest to me, but the actual measures differed substantially not just over time (accounted for by <conversion> being a member of att.datable.w3c) but also between cities and regions. Thus, a shunbul (a volume used to measure grain) equalled 2.25 madd in Acre, 3 madd in Aleppo and 72 (sic) madd in Damascus.

Currently, I can only specify the geographic region for a unit at the level of the <unitDecl>, which would mean, I would have to declare three different units for my small example above, whereas with, for instance, the option to allow <placeName> as child of <conversion> one could just add additional <conversion> elements to an existing <unitDecl>. In a sense, I ultimately image a system similar to the advantages of att.datable.w3c, which is already present.

This would remove a lot of overhead to the mark-up and the mark-up process. I have encoded every one of my 20k sources (mostly letters and periodical articles) as an individual TEI file that allows for easy deduction of the geographic region/place a source. With the proposed <placeName> child, in addition to @from and @to attributes, one could then easily look up the relevant conversion/@formula for normalising and comparing measures.

duncdrum commented 5 years ago

@tillgrallert this kind of use case has come up in our discussion of the new elements. I think it would help the discussion if you could provide us with two encoding examples one of what you can do valid now and one with how your proposed changes would look.

Off the top of my head I have a few questions, and suggestions. The bigger question about your issues is how different *Decl elements in the header relate to each other and how clean aka independent from each other should be.

ebeshero commented 5 years ago

Council F2F in Graz: Try adding an @where attribute on <conversion> to designate the place would be specific. Let's also take a look at Naoki's article on this in JTEI. Check his Example 7 in particular.

ebeshero commented 4 years ago

@tillgrallert Sorry for the long delayed direct reply here, but you'll see from my brief comment above that TEI Council discussed your proposal, together with review of Naoki's recent publication by Council. What we're recommending is along the lines of @duncdrum 's suggestion, that you define a canonical list of places (as in a <listPlace>, see https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDGEOG) , and then in your <conversion> elements you can point to a canonical identifier for a specific place. Would this work? Also, we'd be very grateful if you can provide an example from your work to include in the Guidelines as an example use-case. (We might need to edit it a bit if we're going with adding an attribute to <conversion>, but what we'd like to do here is pretty close to your suggestion.)

tillgrallert commented 4 years ago

@ebeshero et al. thank you very much for considering my feature request. I apologise for not having replied earlier. This was due to a prolonged parental leave-of-absence. I will look into @duncdrum 's suggestion and provide the examples you asked for by the end of this week.

All the best and a happy new year!

sydb commented 4 years ago

Council VF2F subgroup suggests we close this soon, unless @tillgrallert pops up to say the suggested solution does not work (and why not).

tillgrallert commented 4 years ago

Dear all, I have to apologise once again and thank you for your patience. I would strongly support the introduction of a @where attribute that allows to point to canonical lists of places.

With the new <unitDecl> and without a potential @where attribute, I would encode my original example as follows:

<unitDef type="volume" xml:id="shunbul_acre">
  <label xml:lang="ar-Latn-x-ijmes">shunbul</label>
  <label xml:lang="ar">شنبل</label>
  <placeName>Acre</placeName>
  <placeName>Tripoli</placeName>
  <conversion fromUnit="#shunbul_acre" toUnit="#kile" formula="$fromUnit * 2.5" when="1873"/>
  <conversion fromUnit="#shunbul_acre" toUnit="#kile" formula="$fromUnit * 2.25" when="1878"/>
  <desc xml:lang="en">The <foreign xml:lang="ar-Latn-x-ijmes">shunbul</foreign> was the main unit to measure grain in the Northern parts of Greater Syria.</desc>
</unitDef>
<unitDef type="volume" xml:id="shunbul_aleppo">
  <label xml:lang="ar-Latn-x-ijmes">shunbul</label>
  <label xml:lang="ar">شنبل</label>
  <placeName>Aleppo</placeName>
  <conversion fromUnit="#shunbul_aleppo" toUnit="#kile" formula="$fromUnit * 3" from="1891" to="1910"/>
  <conversion fromUnit="#shunbul_aleppo" toUnit="#kile_istanbul" formula="$fromUnit * 2.5" when="1860"/>
  <desc xml:lang="en">The <foreign xml:lang="ar-Latn-x-ijmes">shunbul</foreign> was the main unit to measure grain in the Northern parts of Greater Syria.</desc>
</unitDef>
<unitDef type="volume" xml:id="shunbul_damascus">
  <label xml:lang="ar-Latn-x-ijmes">shunbul</label>
  <label xml:lang="ar">شنبل</label>
  <placeName>Damascus</placeName>
  <conversion fromUnit="#shunbul_damascus" toUnit="#madd" formula="$fromUnit * 72" when="1893"/>
</unitDef>
<!-- the following measures are only included for validation purposes -->
<unitDef type="volume" xml:id="kile">
  <label>kile</label>
</unitDef>
<unitDef type="volume" xml:id="kile_istanbul">
  <label>Istanbul kilesi</label>
</unitDef>
<unitDef type="volume" xml:id="madd">
  <label>madd</label>
  <conversion fromUnit="#madd" toUnit="#cift" formula="$fromUnit * 1"/>
  <conversion fromUnit="#madd" toUnit="#kile" formula="$fromUnit * 0.5"/>
</unitDef>

This would require as many <unitDef>s for the volume of shunbul as there are locations in my corpus. And each, in turn, would need its own @xml:id. This would add encoding overhead and would require new units to be defined even when conversion rates at two places where the same --- just for the sake of safeguarding against future findings that the shunbul did differ between the two places at another point in time or according to new sources.

If there was a @where attribute that allows to point to a cannonical list of places, this would improve the mark-up significantly in two ways:

  1. It would prevent the creation of multiple @xml:ids for a single unit/measure
  2. It would reduce the lines of code needed to describe this unit and its various conversions
 <unitDef type="volume" xml:id="shunbul">
   <label xml:lang="ar-Latn-x-ijmes">shunbul</label>
   <label xml:lang="ar">شنبل</label>
   <conversion formula="$fromUnit * 2.5" fromUnit="#shunbul" toUnit="#kile" when="1873" where="#acre #tripoli"/>
   <conversion formula="$fromUnit * 2.25" fromUnit="#shunbul" toUnit="#kile" when="1878"  where="#acre #tripoli"/>
   <conversion formula="$fromUnit * 3" from="1891" fromUnit="#shunbul" to="1910" toUnit="#kile" where="#aleppo"/>
   <conversion formula="$fromUnit * 2.5" fromUnit="#shunbul" toUnit="#kile_istanbul" when="1860" where="#aleppo"/>
   <conversion formula="$fromUnit * 72" fromUnit="#shunbul" toUnit="#madd" when="1893" where="#damascus"/>
   <desc xml:lang="en">The <foreign xml:lang="ar-Latn-x-ijmes">shunbul</foreign> was the main unit to measure grain in the Northern parts of Greater Syria.</desc>
</unitDef>
tillgrallert commented 4 years ago

Dear all,

I would like to add that such a proposed @where attribute would also be needed on <measureGrp> and <measure> referencing the units specified in a <unitDef> in order to establish the correct conversion:

<p>The <measureGrp where="#aleppo">current price of <measure commodity="wheat" unit="#shunbul" quantity="1">a shunbul of wheat</measure> is between <measure commodity="currency" unit="#ops" quantity="90">Ps 90</measure> and <measure commodity="currency" unit="#ops" quantity="117">Ps 117</measure></measureGrp> due to the price hikes.</p>
ebeshero commented 4 years ago

Most of Council agrees with implementing @where on <conversion>, and we want an attribute class for @where to be used as it is on <event>. But putting multiple values on @where seems problematic for keeping this definition consistent with its use on <event>. We are considering an attribute class, att.locatable and will open a new ticket about this.

sydb commented 4 years ago

If <conversion> were to have a @where attribute that allowed only one value, @tillgrallert’s example above would be encoded with something like the following:

 <unitDef type="volume" xml:id="shunbul">
   <label xml:lang="ar-Latn-x-ijmes">shunbul</label>
   <label xml:lang="ar">شنبل</label>
   <conversion formula="$fromUnit * 2.5" fromUnit="#shunbul" toUnit="#kile" when="1873" where="#acre"/>
   <conversion formula="$fromUnit * 2.5" fromUnit="#shunbul" toUnit="#kile" when="1873" where="#tripoli"/>
   <conversion formula="$fromUnit * 2.25" fromUnit="#shunbul" toUnit="#kile" when="1878"  where="#acre"/>
   <conversion formula="$fromUnit * 2.25" fromUnit="#shunbul" toUnit="#kile" when="1878"  where="#tripoli"/>
   <conversion formula="$fromUnit * 3" from="1891" fromUnit="#shunbul" to="1910" toUnit="#kile" where="#aleppo"/>
   <conversion formula="$fromUnit * 2.5" fromUnit="#shunbul" toUnit="#kile_istanbul" when="1860" where="#aleppo"/>
   <conversion formula="$fromUnit * 72" fromUnit="#shunbul" toUnit="#madd" when="1893" where="#damascus"/>
   <desc xml:lang="en">The <foreign xml:lang="ar-Latn-x-ijmes">shunbul</foreign> was the main unit to measure grain in the Northern parts of Greater Syria.</desc>
</unitDef>
ebeshero commented 3 years ago

I'm reviewing this ticket and recalling that we did not ultimately implement the proposal for att.locatable. We need another mechanism for implementing this, or another attribute name, perhaps. @martinascholger @sydb @hcayless can you help me review, ahem, where we are with @where on <conversion> ?

ebeshero commented 3 years ago

To summarize:

ebeshero commented 3 years ago

Okay. in reviewing #1769 , my attempt to implement att.locatable in 2020 was an effort to make a class that would work for both <move> and <event>. We stopped work on that because we determined on a solution that would keep these elements distinct. That was @hcayless 's introduction of teidata.authority especially to resolve the distinct requirements of move/@where because this @where isn't necessarily about pointing to canonical lists of places. (See https://github.com/TEIC/TEI/pull/1974 ).

I closed my PR to introduce att.locatable last year (https://github.com/TEIC/TEI/pull/1958) because we agreed we would not attempt a single class to reconcile these two distinct uses of @where on <move> vs. <event>.

I think we should introduce att.locatable again, this time in a much simpler way: This would define the attribute class for <event>, <conversion> (and in future any other participating elements) that would use @where to point to a canonical list of places. Is it fair to say that event/@where and conversion/@where are the same kind of pointing?

ebeshero commented 3 years ago

@martinascholger Can we put this ticket on the agenda for the July 2021 Council meeting? I think we can move forward on addressing it.

ebeshero commented 3 years ago

@tillgrallert I think I've found a good way to set @where on <conversion>, and would very much like to feature the very clear example you provided in this ticket (which I'll just copy here). Can you provide source information for this example that we can include in the Guidelines' Bibliography?

 <unitDef type="volume" xml:id="shunbul">
   <label xml:lang="ar-Latn-x-ijmes">shunbul</label>
   <label xml:lang="ar">شنبل</label>
   <conversion formula="$fromUnit * 2.5" fromUnit="#shunbul" toUnit="#kile" when="1873" where="#acre #tripoli"/>
   <conversion formula="$fromUnit * 2.25" fromUnit="#shunbul" toUnit="#kile" when="1878"  where="#acre #tripoli"/>
   <conversion formula="$fromUnit * 3" from="1891" fromUnit="#shunbul" to="1910" toUnit="#kile" where="#aleppo"/>
   <conversion formula="$fromUnit * 2.5" fromUnit="#shunbul" toUnit="#kile_istanbul" when="1860" where="#aleppo"/>
   <conversion formula="$fromUnit * 72" fromUnit="#shunbul" toUnit="#madd" when="1893" where="#damascus"/>
   <desc xml:lang="en">The <foreign xml:lang="ar-Latn-x-ijmes">shunbul</foreign> was the main unit to measure grain in the Northern parts of Greater Syria.</desc>
</unitDef>
tillgrallert commented 3 years ago

Dear @ebeshero, thank you so much for seeing this issue through. Which form should the needes source information take? Information on the research project, the corpus of sources involved or something else?

ebeshero commented 3 years ago

@tillgrallert I am hoping to add a bibliographic citation in the Guidelines to the source of this code you provided. Am I correct in thinking that this is your research expressed in the code? If so, you are the author, and the code is your research project. It is such a good example of what we want to do with <unitDef> and <conversion> so I really would like to cite you here! Thank you!

tillgrallert commented 3 years ago

@ebeshero thank you very much. I just published archival releases of the data set and tools (including XSLT) for this research project on Github and Zenodo.

  1. Grallert, Till. ‘Data on Food Riots and Food Prices in the Eastern Mediterranean (Bilād al-Shām) in the 19th and 20th Centuries’. Zenodo, 4 August 2021. https://doi.org/10.5281/zenodo.5159018.
  2. ———. Tools for the Computational Normalisation and Analysis of Food Prices and Food Riots in the Eastern Mediterranean (Bilād al-Shām) in the 19th and 20th Centuries. Zenodo, 2021. https://doi.org/10.5281/zenodo.5159020.

The original mark-up for the normalization of measures is found in the second repository, which I will update to reflect the new possibilities of the TEI.

ebeshero commented 3 years ago

Thanks @tillgrallert ! I've added the bibliography citation pointing to your work for this example.