DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Marking and cross-referencing obsolete / deprecated forms #73

Closed phoenix-mossimo closed 4 years ago

phoenix-mossimo commented 4 years ago

Hi all, I have a question concerning the deprecated word forms. We (CCL*) have a number of spellings, which currently include brackets (i.e. "xx(y)xx") and have to be disambiguated into two distinct variants 1) "xxxx"and 2) "xxyxx", which keep the properties of the original form (e.g. attributes, sub-elements). For referencing purposes, however, we would like not to eliminate the original, but to keep it, marking as "deprecated", "obsolete" etc., plus add cross-references to both new and old forms. Considering that the current form "type"-attribute might be occupied already (with e.g. @type="inflected") which is the LEX-0-conform way to fulfill the three tasks?

We were considering the following options (thanks @laurentromary for the feedback):

1) Marking the existing form as obsolete / deprecated with a) an extra lable or b) @change/@status attributes: a) <form type="inflected"> <lbl>obsolete</lbl> b) <form type="inflected" change="deprecated"> or <form type="inflected" status="deprecated">

2) Pointing to the new-forms with a) an extra cross-reference or b) @corresp attribute: a) <form type="inflected"><lbl>obsolete</lbl><xr type = "substitutedBy"><ref type=“form” target="#xml_id1 #xml_id2"/></xr> b) <form type="inflected" change="deprecated" corresp="#xml_id1 #xml_id2">

3) Pointing to the old form with a) an extra cross-reference or b) @corresp attribute: a) <form type="inflected"><xr type ="substituting"><ref type=“form” target="#xml_id0"/></xr> b) <form type="inflected" change="new?" corresp="#xml_id0">

Thanks for the feedback.

*Comprehensive Coptic Lexicon, hosted at BBAW (Thesarus Linguae Aegyptia project). Recent Dataset: https://refubium.fu-berlin.de/handle/fub188/24570 Web-Interface: http://coptic-dictionary.org/

laurentromary commented 4 years ago

My favorite would be <form type="inflected" status="deprecated" corresp="#xml_id1 #xml_id2">, with a customization of form to make it member of att.docStatus. I have the feeling that <xr> would be an overkill for this use case. Note that I don't limit my reasoning to the TEI Lex framework (we may not integrate the corresponding customization) but in the TEI framework at large.

TomazErjavec commented 4 years ago

By what @phoenix-mossimo writes, you would use @type if it wasn't already used for other purposes. So you could use form/@subtype instead.

@status to me sounds a bit tag abusive, as, if I understand correctly (cf. https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html#TEI.att.docStatus) it is meant for the status of something in terms of the document workflow. But here, again, if I understand correctly, you are taking about obsolete (archaic?) word-forms.

laurentromary commented 4 years ago

There are two aspects here:

TomazErjavec commented 4 years ago

I don't feel at ease when I use two separate dimensions on @type and @subtype

That is a good point (which I haven't taken into consideration in my attemps at converting dictionaries into Lex0, hmm)

the semantic of @status is very close to what we want

Maybe I misunderstood in which way the forms are "deprecated" but if these are forms in a published dictionary, and they used to be spelled "xxyxx" but no longer are (say, because of a spelling reform), then I still think @status is dubious: yes, it contains the value "deprecated", but the other values are "draft", "published" etc. surely a different kettle of fish.

phoenix-mossimo commented 4 years ago
  • I don't feel at ease when I use two separate dimensions on @type and @subtype, and would thus avoid @subtype to express something that has nothing to do with the first typology ("inflected")

Yes, I agree that @type and @subtype should not be semantically heterogeneous.

@status to me sounds a bit tag abusive, as, if I understand correctly (cf. https://dariah-eric.github.io/lexicalresources/pages/TEILex0/TEILex0.html#TEI.att.docStatus) it is meant for the status of something in terms of the document workflow. But here, again, if I understand correctly, you are taking about obsolete (archaic?) word-forms.

I also understood the TEI specification in the way that @change and @status refer rather to work process, therefore my reservations

Maybe I misunderstood in which way the forms are "deprecated" but if these are forms in a published dictionary, and they used to be spelled "xxyxx" but no longer are (say, because of a spelling reform), then I still think @status is dubious: yes, it contains the value "deprecated", but the other values are "draft", "published" etc. surely a different kettle of fish.

Currently published is the form "xx(y)xx", e.g. http://coptic-dictionary.org/entry.cgi?tla=C5355. We want to make two forms out of it: 1) ⲟⲩⲟϭⲉ; 2) ⲟⲩⲟⲟϭⲉ

phoenix-mossimo commented 4 years ago

Ok, so the solution I have chosen is:

@status applies to the "workflow status" of the given form - "approved", "published", "deprecated" etc.

laurentromary commented 4 years ago

That looks good. I just have a problem with @n having twice the same value: does it make sense from your editorial point of view?

phoenix-mossimo commented 4 years ago

It was a copy-paste. I removed it.