TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
274 stars 88 forks source link

fDecl doesn't allow att.datcat yet #1081

Open TEITechnicalCouncil opened 11 years ago

TEITechnicalCouncil commented 11 years ago

TEI P5 2.1.0 says at the end of section 18.3

"Whether at the level of feature-system declarations, feature- and feature-value libraries, or individual features, it is possible to align both feature names and their values with standardized external data category repositories such as ISOcat."

However, a feature declaration can not be associated with an ISOcat data category using @dcr:datcat yet, i.e., <fDecl> [1] doesn't yet include the att.datcat [2] attribute set which <f> [3] already does.

If <fDecl> would allow @dcr:datcat it becomes possible to declare all relationships with ISOcat data categories in a feature system declaration instead of doing so highly redundant in each feature instantiation.

[1] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fDecl.html [2] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datcat.html [3] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-f.html

Original comment by: mawindhouwer

TEITechnicalCouncil commented 8 years ago

This issue was originally assigned to SF user: bansp Current user is: bansp

TEITechnicalCouncil commented 11 years ago

I think this is an oversight. Piotr?

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Definitely an oversight. May I grab this one?

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

Have added it to class: leaving it to you to add some comment / example in the chapter if you think it necessary'; otherwise just close the ticket

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

At the risk of disturbing the anthill let me ask: isn't adding fDecl to att.datcat analogous to adding <equiv> to it (= something that we explicitly rejected) rather than to adding <f> to it?

In other words, what's the semantics of fDecl with dcr:datcat as opposed to the semantics of <f> with @dcr:datcat? Or, in other words, shouldn't this be handled inside the fDecl, by an <equiv>/@uri?

To be sure, I support Menzo's need to provide equivalence/alignment information at the level of fDecl, but I'm beginning to wonder if using dcr:datcat on fDecl directly expresses what we want expressed here.

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

I would leave out the issue of <equiv> here, which is a real one. The point of fDecl is clearly to avoid making a link on each f to assert its semantics.

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

Laurent, I'm not happy meddling in this ticket today, but it seems to me that the point of fDecl is to declare a feature.

The use of @datcat on fDecl seems unfortunately to mean: this fDecl acts as such-an-such category. But the fDecl is not meant to act as (= be aligned with) a category: it is the <f> that is being declared that should be aligned, not the <fDecl>.

Once again, I fully concur with the idea of expressing this information in FSD. I'm increasingly uneasy about the technical solution proposed here though. And we don't want to put in stuff that we are going to pull out in half a year, do we. :-/

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

Agree. I need to ponder upon this a little....

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

I interpret it as that the <fDecl/> declares to which data category the <f/> should be equivalent. Just like the <fDecl/> does that for the type the f value should have. If that role is better suited in TEI for <equiv/> then that could be the proper construct to use. However, it could lead to inconsistencies between the FDS and FSR mechanisms:

Given the following FSD:

<fsDecl type="WordForm"> <fDecl name="writtenForm"> <equiv uri="http://www.isocat.org/datcat/DC-1836"/&gt; <vRange><vNot><string/></vNot></vRange> </fDecl> <fDecl name="grammaticalNumber"> <equiv uri="http://www.isocat.org/datcat/DC-1298"/&gt; <vRange><vAlt> <symbol value="singular"> <equiv uri="http://www.isocat.org/datcat/DC-1387"/&gt; </symbol> … </vAlt></vRange> </fDecl> </fsDecl>

and FSR

<fs type="WordForm"> <f name="writtenForm"> <string>clergyman</string> </f> <f name="grammaticalNumber"> <symbol value="singular"/> </f> </fs>

this FSR and FSD combination should be equivalent to

<fs type="WordForm"> <f name="writtenForm" dcr:datcat="http://www.isocat.org/datcat/DC-1836"&gt; <string>clergyman</string> </f> <f name="grammaticalNumber" dcr:datcat="http://www.isocat.org/datcat/DC-1298"&gt; <symbol value="singular" dcr:datcat="http://www.isocat.org/datcat/DC-1387"/&gt; </f> </fs>

Of course any tool interpreting the FSD can make this transition from <equiv/> to @dcr:, but is it really needed? Introducing <equiv/> would touch maybe more then wanted, e.g., in my example I also used <equiv/> to express the data category equivalence for the singular symbol. As FSR and FSD share <symbol/> <equiv/> would also become part of the FSR. The same is true the other way around @dcr: is available for <symbol/> in the FSR so also in the FSD. In my view it would be strange to use <equiv/> in one part of the FSD and @dcr:* in another part ...

Just my 2 cents ...

Original comment by: mawindhouwer

TEITechnicalCouncil commented 11 years ago

I take the point about <equiv>. On the other hand, I also see an issue concerning the variable interpretation of the dcr: attributes, where, I believe -- and I think this belief is shared by some members the Council, after some exchanges on the topic in the past and now -- there should be one simple interpretation ("I am aligned with this-and-that").

What we have here is, I daresay, an obvious case where we should not feel ready to commit ourselves to one view -- there's still a few paths left open. I would therefore like to follow a suggestion that I have got, to withdraw Lou's modification before the release, and bring it back after the release, while leaving this ticket open.

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

Where do we stand with this? I think we need to alter slight the policy "I am this" when it applies to element declaring the semantics of others (equiv and fDecl are two similar cases). For tsuch elements, the dcr: attributes would mean: "what I declare here is equivalent to this entry in the DCR"

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

That would mean semantics conditioned upon semantics, a pretty nasty dependency.

How about targetting this squarely and setting up something analogous to @targetLang, with the precise semantics mentioned above?

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

Can you elaborate on that? You mean having an additional dcr: attribute?

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

After having pondered on this for a while, I think now that it might be good to make ISO FSD (ISO 24610-2:2011) aware of other ISO standards, as is, well, standard.

This time, it would be nice if ISO FSD was made aware of the existence of ISO DCR and allowed for what we've talked about in this ticket. It seems to me that this is the direction where the effort should be going, otherwise the TEI is pressed into patching what other bodies should have fixed by now.

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

Well in this case, ISO FSD will just take up what the TEI suggests, because a) this is where most appropriate experts are and b) we should normally expect more reactivity on the side of the TEI. So for me the TEI is the place to experiment/implement. We should thus not procrastinate this too long...

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

I agree that FSD is in serious need of an update. It has not changed substantially for many many years. I'm just not sure who the "other bodies" are who might do this -- the workgroup in TC37 has not so far as I know done much lately. It's a joint ISO/TEI workgroup, but it needs revitalising from both sides.

On 12/08/13 10:56, Piotr Banski wrote:

After having pondered on this for a while, I think now that it might be good to make ISO FSD (ISO 24610-2:2011) aware of other ISO standards, as is, well, standard.

This time, it would be nice if ISO FSD was made aware of the existence of ISO DCR and allowed for what we've talked about in this ticket. It seems to me that this is the direction where the effort should be going, otherwise the TEI is pressed into patching what other bodies should have fixed by now.


[bugs:#441] http://sourceforge.net/p/tei/bugs/441/ fDecl doesn't allow att.datcat yet

Status: open-accepted Labels: TEI: Definition of Elements/Attributes/Classes Created: Thu Sep 20, 2012 10:04 AM UTC by Menzo Windhouwer Last Updated: Mon Aug 12, 2013 08:23 AM UTC Owner: Piotr Banski

TEI P5 2.1.0 says at the end of section 18.3

"Whether at the level of feature-system declarations, feature- and feature-value libraries, or individual features, it is possible to align both feature names and their values with standardized external data category repositories such as ISOcat."

However, a feature declaration can not be associated with an ISOcat data category using @dcr:datcat yet, i.e., [1] doesn't yet include the att.datcat [2] attribute set which [3] already does.

If would allow @dcr:datcat it becomes possible to declare all relationships with ISOcat data categories in a feature system declaration instead of doing so highly redundant in each feature instantiation.

[1] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-fDecl.html [2] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datcat.html [3] http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-f.html


Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/tei/bugs/441/

To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

So, back to the actual issue. How to mark on fDecl the equivalence to an ISOCat entry. Piotr rejects the existing dcr: attributes since he says it is not the appropriate semantics. Is there a consensus on this? If yes, we need an extra mechanism. Another attribute? also in the dcr: namespace?

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

Apart from the status of ISO/TEI FSD, one more issue that may need to be considered is that it may be expected that it's the "new" piece of the standards puzzle that should be responsible for defining bindings for the other standards, rather than forcing them to update.

ISO DCR has done a good job by defining and namespacing its two attributes that can now be plugged into other descriptions. Possibly, what Laurent has mentioned above may be a good further step: to expect ISO DCR to update and extend its set of {@datcat, @valueDatcat} with two more attributes that can be used in schemas, notably in ISO FSD: {@targetDatcat, @targetValueDatcat}.

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

OK. Than let's define a proposal for these two attributes, liaise with Menzo and Sue-Ellen to fine-tune and have this adopted on booth side.

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

BTW, it's not that I had the whim to reject the use of the existing dcr: attributes: I have merely pointed at considerations and decisions made in an analogous case.

To push the point further: there is nothing illogical in placing @dcr:datcat on an fsDecl or fDecl, to make sure that they are interpretable as data containers for declarations of feature structures or of features. That is in full accordance with DCR principles, and, in fact, this point alone should suffice to cut the discussion on the initial suggestion, and redirect it towards looking for a place and a method to define something like @targetDatcat.

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

Here below, I put a comment from Menzo Windhouwer: [quote] If I understand correctly there were concerns due to the implicit indirection from a schema declaration to the instances, e.g., if you link a DC with @dcr:datcat to an element declaration in a schema it actually implies that the instance is linked to this DC. It's now proposed to make this explicit by using @dcr:targetDatcat and @dcr:targetValueDatcat. I've added support for this to the DC Reference schema. See DC Reference schema version 1.2 at (on the ISOcat test server, not yet on isocat.org):

http://lux13.mpi.nl/isocat/12620/

So these could be used by fDecl now, e.g.,

tei:vRange tei:vAlt /tei:vAlt /tei:vRange /tei:fDecl Do notice that as the tei:fDecl schema reuses tei:vRange the value declarations, e.f., tei:symbol in this case, should support both the dcr:target\* attributes and the 'old' dcr:\* attributes as these elements can also appear in an instance. Please let me know if I misunderstood or other solutions are preferred. If it's all correct I'll include this in the next update of isocat.org. And also update the FSD support of RELISH LMF (see http://tla.mpi.nl/relish/lmf/). [/quote] Original comment by: @laurentromary
TEITechnicalCouncil commented 11 years ago

Here is the fDecl example that gone missing in the previous post:

<tei:fDecl name="partOfSpeech" dcr:targetDatcat="http://www.isocat.org/datcat/DC-1345">
    <tei:vRange>
        <tei:vAlt>
            <tei:symbol value="..."/>
            <tei:symbol value="commonNoun" dcr:targetValueDatcat="http://www.isocat.org/datcat/DC-1256"/>
        </tei:vAlt>
    </tei:vRange>
</tei:fDecl>

Original comment by: mawindhouwer

TEITechnicalCouncil commented 11 years ago

Hi, All,

Any consensus on this? I'm preparing an isocat.org update and would like to know if I can include the changes for TEI in the DC Reference RELAX NG schema.

Thanks in advance,

Menzo

Original comment by: mawindhouwer

TEITechnicalCouncil commented 11 years ago

I think your last proposal did not lead to any criticism. I would say you can move ahead. @others?

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

I'm happy to see this done so swiftly. Thank you, Laurent, for pushing this issue ahead and thank you, Menzo, for the implementation on the DCR side. And may I say that when this goes through, it will be practically immediately usable in the KorAP project, where we use FSR for annotations and our FSDs are being created right this month.

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

So what, concretely, is the proposal for change to the TEI here? Reviewing the discussion, Piotr's original objection to my hasty solution of adding <fsd> to the att.datcat class seems correct. The preferred solution should be to add dcr:targetDatCat to <fsd>

We could do any of the following:

(a) Nothing. Adding attributes from another namespace is always possible and does not perturb the schema. If we choose this option however, we really must make the possibility explicit in the text of the Guidelines, and supply an example.

(b) Modify the att.datcat class soi that it also provides dcr:targetDatCat, possibly adding schematron rules to say that you can only inherit it if you are an <fsd> in which case you cannot inherit the other att.datcat attributes. This seems a bit tortuous.

(c) Explicitly add the attribute to <fsd>

(d) Define a new class att.refDatCat to supply the attribute and add <fsd> to it.

If it's not obvious I would recommend either (a) or (c).

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

I would say d), because it could also be useful for elementSpec

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

You wouldn't put only <fsd> to the class att.refDatCat isn't it? Values appearing in the FSD, e.g, tei:symbol (see the example some posts earlier), should also use @dcr:targetDatcat or @dcr:targetValueDatcat instead of @dcr:datcat or @dcr:valueDatcat. Or did I understand the whole argument wrong?

Original comment by: mawindhouwer

TEITechnicalCouncil commented 11 years ago

On 06/09/13 11:40, Menzo Windhouwer wrote:

You wouldn't put only to the class att.refDatCat isn't it? Values appearing in the FSD, e.g, (see the example some posts earlier), should also use @dcr:targetDatcat or @dcr:targetValueDatcat instead of @dcr:datcat or @dcr:valueDatcat. Or did I understand the whole argument wrong?

(Menzo : if you meant to say "You wouldn't put only <fsd>..."
remember you have to escape the pointy brackets !)

This is a good argument for my suggestion (d) then, and since I see Laurent also prefers that, I guess that's the right choice.

So we need (a) someone to draft the new class declaration (b) someone to specify the elements which should be members of that class (c) someone to write some appropriate text with examples for the Guidelines.

I can help with (a) and possibly (c), though I'd rather Piotr or Menzo took a first stab at it. Is there a consensus on (b) ?

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Your solution (d) seems the cleanest on the assumption that we want to continue mentioning selected attributes from other namespaces explicitly within the documentation.

Let's talk about point (b) in the action items. We want fsDecl, fdecl, symbol and numeric there, I guess -- what else?

(edit: I feel lost switching between straightforward datatyping and external semantics... string, binary, numeric -- do we want these? maybe at least experimentally? this is primarily a question to Menzo, I think)

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

elementSpec, equiv, valDef, attDef

Original comment by: @laurentromary

TEITechnicalCouncil commented 11 years ago

As in the FS instance @dcr:datcat or @dcr:valueDatcat is allowed on a <binary> and <string> as well, I would also allow @dcr:targetDatcat or @dcr:targetValueDatcat on them in the FSD.

Original comment by: mawindhouwer

TEITechnicalCouncil commented 11 years ago

Laurent says, "elementSpec, equiv, valDef, attDef". This is a bit a jump from the direct focus here, but I can understand the principle behind it. Except that we should probably divide this into {elementSpec, valDef, attDef} on one side, as the defining elements, and {equiv} on the other, as the referencing element that does a somewhat alternative job to the dcr:target* attributes.

SO, again, to reference Lou's item "(b) someone to specify the elements which should be members of that class":

Are we getting there?

Original comment by: @bansp

TEITechnicalCouncil commented 11 years ago

Hi, All,

Version 1.2 of the DC Reference schema is now available in HTML, RNG and RNC at isocat.org. This revision provides the targetDatcat and targetValueDatcat attributes (and elements).

Best,

Menzo

Original comment by: mawindhouwer

TEITechnicalCouncil commented 10 years ago

Original comment by: @bansp

TEITechnicalCouncil commented 10 years ago

Hi Piotr,

What does it mean that priority is lowered to 1? Will this feature be delayed (forever)? Is there anyway I can assist to get this implemented on the TEI side?

Best,

Menzo

Original comment by: mawindhouwer

TEITechnicalCouncil commented 10 years ago

Original comment by: @bansp

TEITechnicalCouncil commented 10 years ago

Resetting the priority back to 5 -- for some reason, when you edit a ticket, the priority magically drops to the minimum unless you manually set it back to "5". I suspect a bug in the ticketing system. Thanks to Menzo for the catch (fortunately, I only modified some three tickets today!)

Original comment by: @bansp

jamescummings commented 7 years ago

Hey @bansp Council is looking at this and is confused about what needs to be done. Can you give us an update?

bansp commented 7 years ago

Yikes, saw this only now. Will make it into a LingSIG project, not to forget.

bansp commented 7 years ago
  1. @dcr:datcat` anddcr:valueDatcat` are mechanisms that align annotations with external data categories (for example, with a linguistic ontology that describes the notion of "part of speech" and locates this notion among related concepts, and then also describes notions such as "noun", "verb", "adposition", and locates them among other grammatical concepts).

  2. The choice of examples above was not random: when we say <pos>noun</pos> (and <pos>n</pos>, <pos>subst</pos>, <pos>s</pos>, ... think also of the symbols for "noun" used in languages other than English), we would like to be able to say something about (i) the container and (ii) the content, hence we sometimes need to use two attributes, "datcat" for the container and "valueDatcat" for the content. We therefore do:

    <pos dcr:datcat="http://www.isocat.org/datcat/DC-1345" dcr:valueDatcat="http://www.isocat.org/datcat/DC-1333">noun</pos>
  3. <pos>noun</pos> is a feature-value pair. TEI and ISO together have a joint specification of a system designed to describe complex feature matrices, ISO/TEI FSR, "feature structure representations", where we could say:

    <f name="partOfSpeech" dcr:datcat="http://www.isocat.org/datcat/DC-1345">
    <symbol value="noun" dcr:datcat="http://www.isocat.org/datcat/DC-1333"/>
    </f>

    although, keeping to the modelling distinction between container and content, we could just as well say:

    <f name="partOfSpeech" dcr:valueDatcat="http://www.isocat.org/datcat/DC-1345">
    <symbol value="noun" dcr:valueDatcat="http://www.isocat.org/datcat/DC-1333"/>
    </f>

    On yet another take, we could prioritise the parallelism with the <pos> above and use "datcat" with <f> and "valueDatcat" with <symbol>. The distinction can be considered cosmetic or be given a more fundamental role, but this is outside of the scope of the discussion here (here, the point is: we need mechanisms to align annotations expressed in the TEI with external reference data category repositories).

  4. FSR can be modelled in FSD, which is a language for creating FSR schemas. Think of it as analogous to ODD in relation to TEI schemas (and this is why the <equiv> analogy was invoked above). Menzo has kindly provided an example of FSD:

    <tei:fDecl name="partOfSpeech" dcr:targetDatcat="http://www.isocat.org/datcat/DC-1345">
    <tei:vRange>
        <tei:vAlt>
            <tei:symbol value="..."/>
            <tei:symbol value="commonNoun" dcr:targetValueDatcat="http://www.isocat.org/datcat/DC-1256"/>
        </tei:vAlt>
    </tei:vRange>
    </tei:fDecl>

    What the above fragment does is declare a feature (<f>) with the name "partOfSpeech", and declare a range of possible values that that feature might assume. Menzo uses dcr:targetDatcat and dcr:targetValueDatcat to say that it's the modelled element, i.e., <f> (and one of its potential values) that should be aligned with the appropriate DCR entries, rather than the elements <fDecl> and <symbol>.

Note, in particular, that there is nothing untoward about having an ontology of modelling concepts, in which an "element modelling a feature" and an "element modelling a value" could be defined, and then the <tei:fDecl> above could carry both a "targetDatcat" attribute for the modelled feature, and a "datcat" attribute pointing at the concept "element modelling a feature". Which, incidentally, is a killer argument for setting up the target* attributes.

  1. We therefore postulate creating an attribute class "att.targetDatcat", to which the schema-modelling elements fsDecl, fdecl, symbol, string, binary, numeric and also elementSpec, valDef, and attDef should be added. If the Council wishes so, I can take this on as a LingSIG project.

  2. On a nostalgic note, I recall giving a paper in 2010 in Zadar, where I postulated enriching tagUsage with DCR references, in a very specific, dynamic setting of bilingual dictionaries that obeyed a single ODD but could vary in the grammatical descriptions, from language to language. A comment from Sebastian in the audience was that I should perhaps consider modelling this at the level of ODD. I disagreed then and I disagree still, in the specific context that I described in that presentation, but in general, if I were to try this kind of modelling at the level of ODD, I would need for that purpose the attributes postulated in this very ticket.