UniversalDependencies / tools

Various utilities for processing the data.
GNU General Public License v2.0
203 stars 43 forks source link

Adding new auxiliary verb by specify_auxiliary.pl #75

Closed s10018 closed 1 year ago

s10018 commented 3 years ago

I want to add word to auxiliary verb list for Japanese UDs for Modern and Spoken Japanese. (#71 ) I checked below site, but i cannot find how to add new auxiliary verb. http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl Do you have any plans on adding new auxiliary verb the site in the future? @masayu-a

dan-zeman commented 3 years ago

The system first checks whether the auxiliaries that had been previously hard-wired in the source code have already been documented. If it finds one or more auxiliaries for which the case has not been made, it asks the user to document them first. If there is no such backlog, the system offers the option of adding a new auxiliary (as can be observed e.g. for Korean).

This system of maintaining language-specific rules is quite new and it is possible that not everything works as intended, so let me know if there are any issues. In particular, the list of possible functions does not include all possible tenses, aspects, voices and moods at present.

masayu-a commented 2 years ago

We want to add Japanese auxiliary verbs for Long Unit Word definition. The list includes compound auxiliary words as syntactic words in Japanese. Hopefully, we want to add contracted auxiliary words in Japanese.

ている
てある
のだ
てくる
ではない
のです
てる
てしまう
てくれる
ていく
てもらう
でもある
てみる
かもしれない
のである
てある
てほしい
じゃない
ていただく
つつある
にすぎない
dan-zeman commented 2 years ago

I have removed from the system the 120 Japanese auxiliaries that were still lacking documentation and thus blocking the addition of any new auxiliaries. You can now add the LUW auxiliaries.

On the downside, Japanese GSD and PUD are now invalid because they contain the undocumented auxiliaries (@kanayamah).

masayu-a commented 2 years ago

@dan-zeman Why did you remove them from only UD Japanese? We need more than 500 auxiliaries.

Furthermore, the list of types should be changed for Japanese. Most auxiliaries should be as "Other"

Could you resolve the 120 Japanese auxiliaries and the additional 21 auxiliaries?

Copula
Perfect
Past
Future
Passive
Conditional
Necessitative
Potential
Desiderative
Other
Undocumented
dan-zeman commented 2 years ago

Why did you remove them?

Because they were never correctly added and thus they were the cause why no new auxiliaries could be added to the system. They can be re-introduced through the form now. But I don't know what their function is, not to speak about examples (this kind of information was not there).

More than 500 auxiliaries sounds very strange (even more than 100 does), given the number of auxiliaries used in other languages. It raises questions of whether all these auxiliaries are really auxiliaries in the UD sense, i.e., are responsible for grammatical features such as tense, aspect, mood and voice. In fact, one of the purposes of the auxiliary registration/validation system is to ensure that the term "auxiliary" is interpreted in line with the guidelines, and similarly across languages.

Note that the "Other" category in the all-language aux table is just a formatting choice that lumps together several actual functions. However, those functions have names, and there is no "Other" function when an auxiliary is being documented. (That said, some tenses / aspects / moods / voices may still be missing from the menu in the form and can be added if needed. But there will never be an "other" option.)

masayu-a commented 2 years ago

@dan-zeman

Could you add

benefactive
honorific

for the list?

dan-zeman commented 2 years ago

I suppose we could classify the benefactive as a Mood. Or how does it function?

As for the honorific auxiliary, does it correspond to one of the values of the feature Polite, e.g., Form or Elev?

masayu-a commented 2 years ago

Yes. Could you add these items in the validation tool? Therefore, we chose "-----".

image

dan-zeman commented 2 years ago

Yes. Could you add these items in the validation tool?

I have added Mood=Ben. What feature value should I add for the honorific auxiliary?

Therefore, we chose "-----".

Actually, it was a bug in the script that allowed you to get away with "-----". I am very sorry for the inconvenience it caused.

masayu-a commented 2 years ago

We need Polite features for auxiliary as https://universaldependencies.org/u/feat/Polite.html But, the Polite features can appear with the other auxiliary functions.

We also need the auxiliary of negation for Uralic languages.

dan-zeman commented 2 years ago

We need Polite features for auxiliary as https://universaldependencies.org/u/feat/Polite.html But, the Polite features can appear with the other auxiliary functions.

My question was more about what value of the Polite feature would corespond to your auxiliary. In the end I went with Polite=Form. If the other auxiliaries use the Polite morphological feature, that is OK. For example, if you have a formal and an informal version of a past tense auxiliary, it is enough to register the lemma with the past tense function, but then the actual word forms in the corpus can (and should) still have both features, i.e., Polite=Infm|Tense=Past resp. Polite=Form|Tense=Past. (Perhaps they should be even treated as forms of one lemma.) I understand your request for adding this function to the auxiliary specification form so that you have an auxiliary that expresses only politeness, without also expressing tense, aspect, or modality, correct?

We also need the auxiliary of negation for Uralic languages.

Do not worry about that, the negative auxiliary is already available and it is used in some Uralic languages in UD (Erzya, Moksha, Komi, Sami).

masayu-a commented 2 years ago

Thank you very much.

We need Polite=Infm, Polite=Form, Polite=Elev and Polite=Humb.

"行き[ません]" Polite=Form|Negation, "なさる" Polite=Elev, and "いたす" Polite=Humb are presented in https://universaldependencies.org/u/feat/Polite.html

The example "行か[ない]" on the page is just Negation. But, we have some auxiliaries of Polite=Infm such as "やがる". Therefore, we need the four Polite features for the auxiliary validation.

Note that, "なさいます" on the page is Polite=Form|Polite=Elev, and "いたします" on the page is Polite=Form|Polite=Humb.

We cannot choose Negation in http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl Could you add Negation in the tool?

dan-zeman commented 2 years ago

The example "行か[ない]" on the page is just Negation.

Sorry, I don't understand. I thought that both 行かない and 行きません are negation, that is, Polarity=Neg, but the latter is polite/formal negation, i.e., Polite=Form. Since the former is not formal, it is informal, i.e., Polite=Infm.

Note that, "なさいます" on the page is Polite=Form|Polite=Elev, and "いたします" on the page is Polite=Form|Polite=Humb.

This follows automatically from the definition on that page, which says that Elev and Humb are subtypes of the formal register (Form). But at most one of the values is put in the morphological features, so if you know that it is e.g. Polite=Elev, you no longer use Polite=Form.

I have now added a politeness-informal function to the auxiliary specification form.

We cannot choose Negation in http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl Could you add Negation in the tool?

It is there since the beginning, it is the seventeenth function: "Needed in negative clauses (like English “do”, not like “not”)". Note that if it is more like English not (as opposed to do in negative clauses), then it is not an AUX but a PART (see here), and its relation to the predicate of the clause is advmod.

masayu-a commented 2 years ago

The example "行か[ない]" on the page is just Negation.

Sorry, I don't understand. I thought that both 行かない and 行きません are negation, that is, Polarity=Neg, but the latter is polite/formal negation, i.e., Polite=Form. Since the former is not formal, it is informal, i.e., Polite=Infm.

"行か[ない]" should be Polarity=Neutral.

Note that, "なさいます" on the page is Polite=Form|Polite=Elev, and "いたします" on the page is Polite=Form|Polite=Humb.

This follows automatically from the definition on that page, which says that Elev and Humb are subtypes of the formal register (Form). But at most one of the values is put in the morphological features, so if you know that it is e.g. Polite=Elev, you no longer use Polite=Form.

OK, Thanks.

I have now added a politeness-informal function to the auxiliary specification form.

Thank you very much.

We cannot choose Negation in http://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_auxiliary.pl Could you add Negation in the tool?

It is there since the beginning, it is the seventeenth function: "Needed in negative clauses (like English “do”, not like “not”)". Note that if it is more like English not (as opposed to do in negative clauses), then it is not an AUX but a PART (see here), and its relation to the predicate of the clause is advmod.

OK, I choose "Needed in negative clauses (like English “do”, not like “not”)". However, the negation auxiliary verbs in the Uralic languages should be auxiliary in UD: https://benjamins.com/catalog/tsl.108 I want to hear the opinions of other Uralic language people。

mehmetoguzderin commented 2 years ago

Speaking from a Turkic perspective, we'd treat a tokenized "not" negation in Turkic "ma" as "PART" akin to the question marker "mu" token. This choice is codified in Old Turkish as in the following paper: https://aclanthology.org/2021.udw-1.11/

dan-zeman commented 1 year ago

It seems to me that there are no remaining open questions for the validation infrastructure in this issue, so I am tentatively closing it. Feel free to reopen if another action is needed.