Closed laurentromary closed 4 years ago
Following a discussion with Toma, we have the following thoughts:
<entry>
<!-- cantor,a n. m., f. -->
<form type="lemma">
<orth>cantor</orth>
<gramGrp>
<pos>n</pos>
</gramGrp>
</form>
<form type="inflected">
<orth>cantor</orth>
<gramGrp>
<gen>m</gen>
</gramGrp>
</form>
<form type="inflected">
<orth>cantora</orth>
<gramGrp>
<gen>f</gen>
</gramGrp>
</form>
</entry>
<entry>
<form type="lemma">
<orth>cantor,a</orth>
<gramGrp>
<pos>n.</pos>
<gen>m.</gen>
<pc>,</pc>
<gen>f.</gen>
</gramGrp>
</form>
</entry>
<entry>
<form type="lemma">
<orth>cantor</orth>
<pc>,</pc>
<form type="ending">
<orth>a</orth>
</form>
<gramGrp>
<pos>n.</pos>
<gen>m.</gen>
<pc>,</pc>
<gen>f.</gen>
</gramGrp>
</form>
</entry>
Hopefully we should find one single way to encode things (semantically, never visually). Adding gramGrp to the forms, instead of being outside as TEI proposes, looks interesting.
Nevertheless, I would not add a different version for the male. That one would be in the lemma, and have only one inflected form.
It is clear that, when inflecting, everything should be inherited from the lemma, but replacing whatever is specifically mentioned (in this case, the genre).
Hi @ambs, lexical vs. editorial view is a big challenge for retrodigitized dictionaries. While I would also prefer that we always go for the lexical view, we cannot really enforce it 100% in TEI Lex-0 because some projects will insist on being able to represent accurately the dictionary as it appears in its print edition. Beauty is in the eyes of the encoder 😄.
If you can go with the lexical view with the DACL, I'd be all in favor of it, but that would also mean changing the order in which elements appear OR you would have to do some additional transformations to display things as they were in the original dictionary, but I'm not sure how sustainable that is.
P.S. We haven't met yet in person, but I know you through Ana and let me just say how delighted I am that you're helping out with the conversion of the DACL. I'm also a big fan of the eXist-based backend you built for the Academy!
Hi, @ttasovac. I hope we meet someday. Who knows on eLex conference.
Regarding display vs content, I understand the struggle. I remember learning MathML, and finding it awful to see the its display encoding (and for math, that is something so formal, it makes less sense than for a dictionary).
For DACL, as we are not aiming to display it exactly as it was printed, I prefer to go for the structural encoding, even if then I need to rework the XML to print it properly. :-)
My dears @laurentromary and @ttasovac
We must use <gram type="pos">
or we can use just <pos>
?
<entry type="derivativeWord">
<form type="lemma">
<orth>ensonado</orth>
<form type="lemma">
<orth>ensonado</orth>
<gramGrp>
<gram type="pos" ud:norm “NOUN”>n.</gram>
<gram type="gen">f.</gram>
</gramGrp>
</form>
<form type="inflected">
<orth>ensonado</orth>
<gramGrp>
<gram type="gen">m.</gram>
</gramGrp>
</form>
<form type="inflected">
<orth>ensonada</orth>
<gramGrp>
<gram type="gen">f.</gram>
</gramGrp>
</form>
</entry>
@ambs
Well... Since you ask. TEI Lex 0 recommends the <gram>
version when transforming data into a single target format (e.g. for the Elexis use case). This is too disruptive to my view and would strongly advocate to keep <pos>
. :-}
I would go for
<gram>
<pos>n.</pos>
<gen>m.</gen>
</gram>
and when there are distinct POS, use
<gramGrp>
<gram>
<pos>n.</pos>
<gen>m.</gen>
</gram>
<gram>
<pos>adj.</pos>
</gram>
</gramGrp>
Looks good?
You can’t have these elements inside
When there are 2 or more POS for an entry, you can do one of two things:
1) put the grammar information inside
<entry>
….
<sense n="1">
<gramGrp>
<pos>n.</pos>
<gen>m.</gen>
</gramGrp>
….
</sense>
<sense n="2">
<gramGrp>
<pos>adj.</pos>
</gramGrp>
….
</sense>
</entry>
2) you can just have multiple
<entry>
...
<gramGrp>
<pos>n.</pos>
<gen>m.</gen>
</gramGrp>
<!— <lbl>here if you want —>
<gramGrp>
<pos>adj.</pos>
</gramGrp>
….
</entry>
if you want, you could put a
On Jun 12, 2019, at 4:32 PM, Alberto Simões notifications@github.com wrote:
I would go for
n. m. and when there are distinct POS, use
n. m. adj. Looks good? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/DARIAH-ERIC/lexicalresources/issues/55?email_source=notifications&email_token=ABYQ2HH6NCLYR6QGZVWDCT3P2ECGXA5CNFSM4HVB4IYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXQT4QQ#issuecomment-501300802>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABYQ2HEN5CVOZZX7PSZAAPDP2ECGXANCNFSM4HVB4IYA>.
I think I'd prefer the first option, @iljackb
ok, a sequence of gramGrps look good. As for lbls, I am running from them.
@Ana Salgado anacastrosalgado@gmail.com, good choice the first is the most conventional way to do it :-)
On Wed, Jun 12, 2019 at 4:47 PM Ana de Castro Salgado < notifications@github.com> wrote:
I think I'd prefer the first option, @iljackb https://github.com/iljackb Meanwhile, 'I'm going crazy...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DARIAH-ERIC/lexicalresources/issues/55?email_source=notifications&email_token=ABYQ2HCXJDTB3K3YRJ373ATP2ED6JA5CNFSM4HVB4IYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXQVPWA#issuecomment-501307352, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYQ2HFD3KCI4UGYSDXY6ALP2ED6JANCNFSM4HVB4IYA .
I suspect we will have situations both for having it on different senses, and some other where the same sense will need more than one (and in that situation, use a sequence of gramGrp elements).
@anacastrosalgado, go for it :)
hi @iljackb, replying by mail seems to be somehow broken and you're notifying me (@ana) instead of @anacastrosalgado
I think you can close this issue...
<entry type="derivativeWord" xml:lang="pt" xml:id="antepassado.1" n="1">
<form type="lemma">
<orth>antepassado</orth>
</form>
<form type="inflected">
<orth>antepassado</orth>
<pron>ɐ̃tɨpɐsˈadu</pron>
<gramGrp>
<gram type="gen">m.</gram>
</gramGrp>
</form>
<form type="inflected">
<orth>antepassada</orth>
<gramGrp>
<gram type="gen">f.</gram>
</gramGrp>
<pron>ɐ̃tɨpɐsˈadɐ</pron>
</form>
<gramGrp>
<gram type="pos" norm="ADJ">adj.</gram></gramGrp>
This example shows that when a specific inflected form is featured in the entry it should be clearly defined as an independent form, and have enough information about the inflected type (in this case, that the item is a feminine form). For the grammatical information, the TEI Lex-0 standard suggests the use of the gramGrp tag.
Following an example from Ana Salgado. My suggestion would be to have two
<form type="inflected">
in conjunction to one lemma with cantor.