TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
279 stars 88 forks source link

How to encode measurement #1707

Closed naoki-kokaze closed 5 years ago

naoki-kokaze commented 6 years ago

I'm Naoki who gave a poster presentation at the TEI 2017 at UVic, on how to mark up UNITS that were not based on the metric system. It should be important for us to discuss measurement in a broad sense, because the problem on measurement implicates and represents cultural diversity. A Wikipedia article may help us to understand the importance of discussing cultural uniqueness through measurement.

And, some of the members of TEI have already discussed how useful \<unit> element is. Please see this https://github.com/TEIC/TEI/issues/1461

First of all, I would like to share my revised version of poster which omits the image of the historical source due to the license. TEI_2017_poster_for_github.pdf

The problem is that: how should it be marked up?

銅二千五百十六斤十両二分四銖

which means 'Copper whose weight is 2516Kin, 10Ryo, 2Bu and 4Shu'. It might be complicated to see what's happening, but you might understand it by considering some examples of British measurement, like 12yd, 2ft and 10in, which were also not based on the metric system.

Based on the discussion within the conference, there might be at least two solutions.

(i) Using only one \<measure> element: <measure commodity="銅" n="2516/10/2/4" unit="斤/両/分/銖" />

In current scheme, we can't use \@quantity to store multiple values. Though, we can use any delimiter instead of using slash.

(ii) Using \<measure> element to nest the other \<measure> elements. ...Sorry, I have to get on the plane. Please see the poster file to find a second of the possible solutions!

ebeshero commented 6 years ago

We've now prepared the new element <unit> as discussed in #1461 so that will be available with the next release. Notes:

2) We need to create spec files for <unitDef> and <unitDecl>. My question for our group is this: Can we reconsider whether we need <unitDef> to be a member of att.lexicographic now that we've gone with @unit on the <unit> element?

@peterstadler marked this ticket as a "release blocker" because we do want to implement it soon, but I wonder if we want to look at the new <unit> element first and review these questions before proceeding to release?

ebeshero commented 6 years ago

We're also going to need to do some intensive writing for the Guidelines to introduce <unitDef> and <unitDecl>, and I think we should adapt @naoki-kokaze 's use-case from the 19th-century log-book of the Japanese steam ship company. There's lots to write here and a serious multi-cultural perspective to be described--important material for the Guidelines, but we may not have time to get all this in for this July release of the Guidelines (for which contributions are needed within the day). I think what I can do is simply add @factor to <unit> for now, and start working on new material in a branch which I'll ask us to review together.

ebeshero commented 6 years ago

Sorry! As I review the examples, I see that was @duncdrum 's use-case (the logbook from the Japanese steam ship company).

I've just created a measurement branch on this repo to push new material for the specs and Guidelines connected to this ticket.

duncdrum commented 6 years ago

@ebeshero right, since this is @naoki-kokaze 's baby I d say he gets to call dips on what examples should go into the guidelines. @jamescummings @naoki-kokaze catties 🍺 are collectible at the annual conference in Tokyo

ebeshero commented 6 years ago

@naoki-kokaze @duncdrum @jamescummings @sydb @emylonas I've reset the milestone for this to the release after this one, because we have some writing and testing to do yet (it's too soon for the release of next week).

However, I've also started a measurements branch here on the TEI GitHub repo, where I've set up the specs files for <unitDecl> and <unitDef> and added the @factor to the new <unit> element. I've also added a new section to the Guidelines TEI Header chapter under the encodingDesc for the new <unitDecl> element, and I've added some information there to get us started. Here's what I've done so far: https://github.com/TEIC/TEI/commit/c9efa4928051fdef9b8542ccde0d69473a03ba84

I think it will need more and better writing, and contributions from more of us, but I think that will be good to work on as we're heading to Tokyo in September. I'd like to step aside from this for now and ask others to jump in and work on revising and expanding this--in particular to work up some examples!

sydb commented 6 years ago

I have not looked at the issue carefully, but I doubt that it makes sense to put any of these elements into att.lexicographic. (That doesn’t mean they don’t need @norm, but even if they do, they probably don’t need all the others.)

naoki-kokaze commented 6 years ago

@ebeshero Thank you for the arrangement towards the next release. I agree with @sydb ‘s opinion about att.lexicographic, because the reason why I put the element into att.lexicographic was that I thought we would need @norm.

laurentromary commented 6 years ago

Just for the record, I think we do need such a generic mechanisms as @norm for coded segments (to differentiate from full natural language segments) and I am not completely at ease to have too many ad hoc ones to cover more or less the same usage pattern. Time pressure has probably prevented a debate on this (and I am so glad we have <unit>!), but we should re-open the discussion on this attribute at some point.

ebeshero commented 6 years ago

@laurentromary When we reviewed this together in Council it made sense in the measurement context to work with att.measuremement, and use @unit for the purpose of a normalized value for the <unit> element. As I began work on this ticket, I wondered if <unitDef> might be a different case, if it’s here we might want the lexicographic toolbox?

ebeshero commented 6 years ago

@naoki-kokaze and all: For the Open Council meeting this morning at the Tokyo TEI 2018 conference, we've prepared some slides to introduce TEI Council, and to summarize our work so far on this issue: Take a look here: http://bit.ly/TEI-tc
Naoki, the part about summarizing the ticket work so far is really designed to follow up on anything you'd like to say to introduce the need for better encoding of measurements in the TEI! Council as a whole needs a briefing on all the work we've done so it's easy for us to see what to do next, so that's what I tried to do here with the last three slides.

martindholmes commented 6 years ago

Hi @ebeshero . Lovely slides! The only thing that struck me was "Say you've found something wrong with the TEI", which suggests that all tickets are bug reports; some of course (including @naoki-kokaze 's ticket) are feature requests / enhancements.

ebeshero commented 6 years ago

Thanks @martindholmes ! Yeah, that's an artifact from the old slideset we presented in Vienna...I just changed it with the word "change"! :-)

naoki-kokaze commented 6 years ago

Thank you, Elisa! I have prepared some slides to share the gist of my proposal. https://docs.google.com/presentation/d/1koR84Q0AsHHdYd_kfQNSyqS34kWq7DajvT8p_iqvis8

But there is an omission about att.lexicographic, so let's review that point later!

jamescummings commented 6 years ago

@naoki-kokaze

I believe where you use <place key="#england"/>, you really want to do <placeName ref="#england"/>.

(place is the container for the place-related metadata that you want to point at. It is usually placeName that does the pointing (though it could be ptr or something else more general placeName makes it clear what you are pointing at).)

ebeshero commented 6 years ago

Summary of New Actions decided in Council F2F meeting in Tokyo 2018:

naoki-kokaze commented 6 years ago

@ebeshero Thank you for the minuting the discussions at the Open Session on 10th Sep! I'd like to offer the example markup based on the discussions. I would like all of you to check it and to have any comments or feedbacks.

<encodinDesc>
    <unitDecl>
        <unitDef xml:id="keel" type="weight">
            <label>keel</label>
            <placeName ref="#england"/>
            <conversion fromUnit="#chalder" toUnit="#keel" formula="20" from="1421" to="1676"/>
            <conversion fromUnit="#chalder" toUnit="#keel" formula="16" from="1676" to="1824"/>
            <desc>Keel was a unit measuring weight of coal. It had been equal to 20 chalders from 1421 to 1676, and it was made to be equivalent to 16 chalders from 1676 to 1824.</desc>
        </unitDef>
        <unitDef xml:id="chalder" type="weight">
            <label>chalder</label>
            <placeName ref="#england"/>
            <conversion fromUnit="#bushel" toUnit="#chalder" formula="32" from="1421" to="1676"/>
            <conversion fromUnit="#bushel" toUnit="#chalder" formula="36" from="1676" to="1824"/>
            <desc>Chalder was a unit measuring weight of coal. It had been equal to 32 bushels from 1421 to 1676, and it was made to be equivalent to 36 bushels from 1676 to 1824.</desc>
        </unitDef>
        <unitDef xml:id="bushel" type="weight">
            <label>bushel</label>
            <placeName ref="#england"/>
            <desc>Bushel was a unit measuring weight of coal.</desc>
        </unitDef>
    </unitDecl>
</encodingDesc>
martindholmes commented 6 years ago

@naoki-kokaze This looks great. Recalling our previous discussion, in this particular example, you might omit @fromUnit because the context <unitDef> element provides that information, but we may decide that's just confusing.

Also, if the @formula is XPath, the expressions would be " 32" and " 36" (with the operator for multiplication). But we need to describe exactly how to implement this; it might be clearer to use conventional variable names like this:

$fromUnit * 32

ebeshero commented 6 years ago

Council discussion: @sydb : We need to make really clear that the @formula takes a value in @fromUnit and converts it to @toUnit (and not the other way around). (And be really clear to disambiguate @from and @to (which are dates) from @fromUnit and @toUnit.) We should present this is as a template for a function, rather than the function itself. Also make clear that values for unit conversion should be drawn from the @quantity on <unit>.

ebeshero commented 6 years ago

Council discussion: @sydb : We need to make really clear that the @formula takes a value in @fromUnit and converts it to @toUnit (and not the other way around). (And be really clear to disambiguate @from and @to (which are dates) from @fromUnit and @toUnit.) We should present this is as a template for a function, rather than the function itself. Also make clear that values for unit conversion should be drawn from the @quantity on <unit>.

sydb commented 6 years ago

From Council teleconference of a few minutes ago.

May be a good idea; may be a bad idea. Please comment.

martindholmes commented 6 years ago

@sydb @quantity (meaning the number of fromUnits to be converted) is found on <unit>, not on <formula> (where the formula resides), I think it might be a bit confusing to use @quantity in the formula itself; I think "$quantity" might be better.

peterstadler commented 6 years ago

I just created a job on our Jenkins server for the measurement branch: https://jenkins-paderborn.tei-c.org/job/TEIP5-branch-measurement/ Currently it's building, but when it's finished you can review the changes (made in the measurement branch) to the Guidelines directly at https://jenkins-paderborn.tei-c.org/job/TEIP5-branch-measurement/lastSuccessfulBuild/artifact/P5/release/doc/tei-p5-doc/en/html/index.html

martindholmes commented 6 years ago

My Jinks has one too. Its build has been broken since I set it up, though.

ebeshero commented 5 years ago

Note to all involved here: I'm at last returning to work on implementing Council's decisions on this ticket, hopefully in time for our next release in July! I rebased the measurement branch to be sure it's up to date with dev at this moment. TEI-Jenkins is testing the branch and there's a longstanding issue with something generating duplicate files. I'm going to see if I can fix that first of all, and then continue working on @formula and the conversion markup we developed here.

/var/jenkins_home/workspace/TEIP5-branch-measurement/P5/antbuildweb.xml:32: 
Fatal error during transformation using /var/jenkins_home/workspace/TEIP5-branch-measurement/P5/Utilities/guidelines.xsl: 
Cannot write more than one result document to the same URI: file:/var/jenkins_home/workspace/TEIP5-branch-measurement/P5/Guidelines-web/en/html/ref-model.labelLike.html; 
SystemID: file:/var/jenkins_home/jobs/Stylesheets-dev/lastSuccessful/archive/dist/xml/tei/stylesheet/html/html_oddprocessing.xsl; Line#: 140; Column#: 170
ebeshero commented 5 years ago

In the last pair of commits I've been working on implementing the changes Council agreed on last fall and I'm watching Jenkins to see if our branch passes the right tests. Here's what I've done:

I'm going to need some help reviewing all the documentation and the details! And...I'm happy to report that our branch is not breaking the build--it's just behaving the same way the dev branch is doing now! I'm going to issue a pull request and a formal request for review.

ebeshero commented 5 years ago

For all following this ticket: We've been having extensive discussion on our branch's pull request (https://github.com/TEIC/TEI/pull/1892) as we'd like to try to complete the specs for <unitDecl> and <unitDef> and associated elements and attributes (<conversion> and att.formula). I'm going to summarize what we're working on right now so we have it here in the right place.

ebeshero commented 5 years ago

I updated the summary comment above to reflect that we'd probably want to add @unitRef to att.measurement.

martindholmes commented 5 years ago

@sydb @ebeshero This all looks good to me, except that I think we want to see @unitRef available on <measure> as well as <unit>, don't we? It will get that automatically if the att is in att.measurement, of course. There are cases where there's nothing in the text to tag with a <unit> element, but you still want to capture what unit is being used in the measurement expression.

ebeshero commented 5 years ago

In the measurement branch, I believe I have now implemented everything we've been discussing, and it's passing the build tests. Take a look at the measurement branch pull request now: https://github.com/TEIC/TEI/pull/1892

ebeshero commented 5 years ago

@naoki-kokaze can you post your source for the keel/chalder example and the Japanese examples? I can add that to the bibliography page for the Guidelines.

naoki-kokaze commented 5 years ago

@ebeshero Thank you very much for all of your efforts to develop the new tag set!

The source for English measurement is: Zupko, Ronald Edward. 1977. British Weights & Measures: A History from Antiquity to the Seventeenth Century. Madison: University of Wisconsin Press, pp. 141–151. And the one for Japanese measurement is: (In Japanese) 大隅亜希子. 1996. “律令制下における権衡普及の実態: 海産物の貢納単位を中心として.” 史論 49. pp. 22–44. Available from http://id.nii.ac.jp/1632/00015761/. (Translated in English) Osumi, Akiko. 1996. “On the Popularization of Weights under the Ritsuryo Regime: Focusing on the Units for the Aquatic Products as Tributes.” Shiron (Historica) 49. pp. 22–44.

If you need any help, please let me know!

ebeshero commented 5 years ago

@naoki-kokaze I've just added your sources to the BIB and examples. Thank you! Perhaps we're now ready to close this ticket? I'm waiting for someone else on Council to merge the pull request (probably should not be me.)