UniversalDependencies / tools

Various utilities for processing the data.
GNU General Public License v2.0
203 stars 43 forks source link

More than one copula. #86

Closed masayu-a closed 1 year ago

masayu-a commented 2 years ago

We want to add more than one copula word. Japanese has normal copula and honorific copula.

We also have contracted forms of the copula in speech corpora.

Hopefully, the restriction on the copula words should be relaxed in Japanese and other morphologically rich languages. http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl?lcode=ja

mehmetoguzderin commented 2 years ago

I agree with this suggestion. However, although it is possible to trace these elements to verbs in older attestations of languages where you could xcomp the copulation, the modern usage is often, at least in some languages, very off and would not fit in well. Here, I am not talking about any hypothetical situation but an existing analysis that treats a postposition as a token on its own (see i- and -dur, annotations mark latter to be i- in lemma, which is wrong). To improve this situation for languages that show typological similarities to Japanese, a way to consistently express these modern surfaces of copular constructions is necessary.

dan-zeman commented 2 years ago

I think that the opposition of normal vs. honorific copula can be understood as deficient paradigm, which is an exception to the one-copula-lemma general rule. It is described in the guidelines and also in the introductory paragraphs on the specify_auxiliary page. However, for each of the copula lemmas, one must fill out the field "Deficient" and describe, which part of the deficient paradigm the lemma covers. I filled this field for the current copula だ (saying that it is the normal, non-honorific copula); once the field is filled, the system allows adding other copulas.

masayu-a commented 2 years ago

Could you advise how to deal with contracted forms of the copula in speech corpus?

dan-zeman commented 2 years ago

I don't know what exactly is the nature of the contracted copula. But if it is a contraction, then I suppose it can be linked to the full, uncontracted form. In that case it should get the lemma of the full form, and no additional entry in the system is needed (because the validator checks the lemma, not the surface form).

masayu-a commented 2 years ago

Is there any method to remove from aux list via http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl ? We would like to move some auxiliary words to copula.

dan-zeman commented 2 years ago

No, the interface does not support removal. Let me know the lemmas that should be removed and I will remove them manually in the back end.

masayu-a commented 2 years ago

OK.

ではない
でもある

should be removed.

http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl?ghu=masayu-a&lcode=ja&lemma=%E3%81%A7%E3%81%AF%E3%81%AA%E3%81%84 http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl?ghu=masayu-a&lcode=ja&lemma=%E3%81%A7%E3%82%82%E3%81%82%E3%82%8B

dan-zeman commented 2 years ago

Removed.

masayu-a commented 2 years ago

Thanks!

masayu-a commented 2 years ago

@dan-zeman Sorry, could you remove

こともna

from Japanese auxiliaries?

http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl?ghu=masayu-a&lcode=ja&lemma=%E3%81%93%E3%81%A8%E3%82%82na

dan-zeman commented 2 years ago

Done.