TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
273 stars 88 forks source link

Deprecate oVar and pVar, Revamp oRef and pRef #545

Closed TEITechnicalCouncil closed 7 years ago

TEITechnicalCouncil commented 9 years ago

The TEI dictionary chapter comprises four element for referring back to forms in dictionary entries (oRef, oVar, pRef, pVar), whose respective usage has never corresponded to clear-cut scenarios, especially because of the lack of clear set of use cases and examples. This has lead to a low usage of these elements in most TEI based dictionary projects but also in the absence of best practices for all the concrete cases (examples, etymology) where marking forms and associating them to (real or virtual) entries would help formalising lexical content in a systematic way.

The main issue is that the difference between pRef and pVar (resp. oRef/oVar) does not match the logic of tagging form references in a dictionary entry:

It has also been pointed out that there are also issues related to the unsatisfactory definition of @type and the absence of @notation.

Proposal: we suggest to drop oVar and pVar and extent both the scope and content model of pRef and oRef to offer a simple system for the annotation of forms (orthographic and phonetic) in dictionary entries, with a clear parallel to orth and pron in the description of forms.

The main changes would be:

Original comment by: @laurentromary

TEITechnicalCouncil commented 8 years ago

This issue was originally assigned to SF user: stefaniegehrke Current user is: stefaniegehrke

TEITechnicalCouncil commented 9 years ago

This looks like a good step forward -- deprecating the Var elements while giving the Ref ones more flexibility is a welcome suggestion.

I have a remark and a request, for now:

Original comment by: @bansp

TEITechnicalCouncil commented 9 years ago

To answer Piotr on the last point, here's a possible example of what we have in ming for etymology. It describes a borrowing from English to Japanese. The idea is to mark-up forms (and pronunciations) by means of the revamped oRef/pRef so that one can point to another lexical resource. It may be the case that this resource doe snot exist (yet) or cannot be referenced. @corresp is thus optional of course. But the underlying semantic that etymon are for that would potentially deserve lexical description seems important to me.


         <entry xml:id="taxi" xml:lang="jpn">
            <form type="lemma">
               <orth type="transliterated" notation="romanji">takushī</orth>
               <orth notation="katakana">タクシー</orth>
               <pron notation="ipa">taku'shi:</pron>
               <gramGrp>
                  <pos>noun</pos>
               </gramGrp>
            </form>
            <sense>
               <cit type="translation">
                  <quote>taxi</quote>
               </cit>
            </sense>
            <etym type="borrowing">
               <lbl>source</lbl>
               <lang>English</lang>
               <cit type="etymon">
                  <oRef xml:lang="eng-US" corresp="http://en.wiktionary.org/wiki/taxi">taxi</oRef>
                  <pRef xml:lang="eng-US">'tæksi</pRef>
               </cit>
            </etym>
         </entry>

Original comment by: @laurentromary

TEITechnicalCouncil commented 9 years ago

Hi Laurent, thanks for this, I'll try to have a closer look by the end of the day. This appears to take the *Ref elements into the new century, but then, that's what they needed.

One remark for now: your @xml:lang is placed too high: on the <entry>, it also incorrectly applies to <sense> and <etym>.

Original comment by: @bansp

TEITechnicalCouncil commented 9 years ago

... and 'tæksi is linguistically strange, as well. I don't know if xml:lang (or rather the relevant RFC or BP) accepts sublabels for phonetic script, but if it does, one should be applied here, I think.

Original comment by: @bansp

TEITechnicalCouncil commented 9 years ago

Original comment by: @hcayless

TEITechnicalCouncil commented 9 years ago

Assigning to Stefanie. This looks like a fair amount of work, and I think will need some discussion here and/or on the Council list, and may need to be broken up into smaller chunks. [feature-requests:#544] would be invalidated if this is implemented, so I think they go together.

Original comment by: @hcayless

stefaniegehrke commented 8 years ago

seems like there are existing projects using <oVar>, see the attached sample I recently looked at :

`

Union of European Football Associations
  </form>`
laurentromary commented 8 years ago

Stefanie: your message could not be read. I see two not incompatible strategies: a) deprecate the unwanted elements over a decent period (2 years); b) inform/consult the community concerning a possible transition. I gather there is a small group of concerned users in any case.

stefaniegehrke commented 8 years ago

thank you for the hint, Laurent

so once again : seems like there are existing projects using <oVar>, see the attached sample I recently looked at :

<form type="full"><orth><oVar>U</oVar>nion of <oVar>E</oVar>uropean <oVar>F</oVar>ootball <oVar>A</oVar>ssociations</orth></form>

laurentromary commented 8 years ago

Trying to re-activate this. A first step could be to adapt oRef and pRef so that it bears the same content model as oVar and pVar, and add @notation to the two by the same token. We could indicate in the documentation of oVar and pVar as a remark that there is a presence to use oRef and pRef as main representation tool.

lb42 commented 8 years ago

So, to summarize: the proposal is a) deprecate use of oVar and pVar, recommending that oRef or pRef be used in preference b) make the proposed list of values for @type a semi-closed one c) add Ref elements to the new att.notated class d) change content model of Ref to match that of corresponding *Var element, but with the option of being empty All of that seems eminently simple and do-able. The ticket is with @stefaniegehrke to implement at present, but I can do it later this week if she asks me nicely!

laurentromary commented 8 years ago

That would be great!

hcayless commented 8 years ago

@lb42: @stefaniegehrke says go right ahead.

lb42 commented 8 years ago

Implemented as above. The @type valList for oRef is already open and pRef doesn't have one.

martindholmes commented 7 years ago

Reopening this because the deprecation period is now over and the build is failing because of oVars.

martindholmes commented 7 years ago

Syd will do this while I do the Stylesheets problem.

martindholmes commented 7 years ago

Took a quick look at this and there are a lot of examples in other specs that use oVar, specifically in <case>, <mood>, <per> and <tns>. pVar is still being recommended in the Dictionaries chapter.

It rather looks like deprecation was added, but the additional steps of removing examples and references were not done. I see that the original request was for a two-year deprecation period. I think it might be a good idea to extend the deprecation for another six months or a year, and make sure there's at least one release that has fully-implemented deprecation (no examples or references) before actually removing these elements.

bansp commented 7 years ago

We might even consider raising Martin's proposal to the status of a principle.

sydb commented 7 years ago

I’m in favor of @martindholmes ’s proposal, too. I expect to start tackling this today, but he is right, there is quite a bit of work to be done. Unless I hear objections, I will change the @validUntil to mid-summer rather than actually remove <oVar> & <pVar>.

martindholmes commented 7 years ago

Forgive me for jumping in, but I've added a two-month extension to the deprecation of oVar and pVar in commit 1476d8d so that the build will go ahead on Jenkins and we can see if other commits are OK. Feel free to revert or override when a decision is made about how to proceed.

sydb commented 7 years ago

Since I got no response from the question on TEI-LINGUISTICS, I went ahead and made the changes. Pending discussion there, we can change the one example I’m not sure of (in DI, an <egXML> that now has an <oRef> inside an <oRef>).

Pushed at 2df3800.

bansp commented 7 years ago

Thanks, Syd. I missed that message, but I'd hesitate about how to reply because oRef is one of the hot topics of the TEI-Lex0 group now, and I wasn't much involved in the initial suggestions. Anyway, it's great to see things moving, thanks!

sydb commented 7 years ago

As far as I know, this issue is being kept open only because I am not confident about the encoding of the example “take … < was quite ~n with him >” in section DIHW (in chapter 9, Dictionaries). If someone more learned than I could confirm that it is OK, or suggest an improvement, that would be good.

bansp commented 7 years ago

We're meeting this Monday (whole day), and I will make sure to put this on the agenda.

martindholmes commented 7 years ago

The temporary extension on the expiry of oVar and pVar which I implemented in March has now expired, and the build is broken again. Should I extend it further, or are you ready to remove those elements?

laurentromary commented 7 years ago

I would definitely support this but we should make sure that there is an explicit remarks on oRef and pRef that these element are taking up the semantics of their counterparts.

sydb commented 7 years ago

OK. My plan is to remove these 2 elements this afternoon. Gives us ~2 months to reverse course if we don't like it.

sydb commented 7 years ago

So <pVar> and <oVar> are now gone as of be2ca61. If Mr. Jenkins is happy, will close this ticket.