delph-in / matrix

The Grammar Matrix
https://matrix.ling.washington.edu/index.html
Other
12 stars 6 forks source link

enforcing subjunctive mood in clausal complements #610

Closed leoalenc closed 3 years ago

leoalenc commented 3 years ago

In Romance languages like Portuguese and French, the embedded verb of clausal complements introduced by the complementizer que 'that' must be in indicative or subjunctive mood, depending on the main verb. Verbs of saying and verbs of perception, for example, require that the embedded verb be in the indicative mood, see (1), while verbs of volition require the subjunctive mood, see (2):

(1) O gato percebeu que o cachorro dormia. the cat notice:IND;PST;PFV;3SG that the dog sleep:IND;PST;IPFV;3SG 'the cat noticed that the dog was asleep'

(2) O gato queria que o cachorro dormisse. the cat want:IND;PST;IPFV;3SG that the dog sleep:SBJV;PST;IPFV;3SG 'the cat wanted the dog to sleep'

Other verb types, such as mental verbs, are compatible with both mood values. This depends on whether the speaker is certain or uncertain about the proposition expressed by the dependent clause. It should be noted that, in contrast to English, subjunctive mood is mandatory in cases like (2), i.e., changing the mood of the embedded verb in (2) to the indicative makes the sentence ungrammatical:

(3) *O gato queria que o cachorro dormia. the cat want:IND;PST;IPFV;3SG that the dog sleep:IND;PST;IPFV;3SG

In PorGram, the HPSG Portuguese grammar I'm developing with the LinGO Grammar Matrix, in collaboration with @arademaker, I couldn't enforce this mood distinction with the customization questionnaire, as pointed out by @arademaker in our talk at the Delph-in Virtual 2021 Summit. In the questionnaire, one can specify the mood of the embedded verb as part of a complementation strategy on the Clausal Complements page. By this means, I could successfully implement the mood in complement question clauses introduced by the complementizer se 'if'. However, implementing two different strategies with the complementizer que, one associated with the indicative, the other one associated with the subjunctive, didn't prevent the grammar from overgenerating, so that sentences like (3) and (4) were wrongly treated as grammatical:

(4)*O gato percebeu que o cachorro dormisse. the cat notice:IND;PST;PFV;3SG that the dog sleep:SBJV;PST;IPFV;3SG

In our talk, @arademaker and I suggested that this problem could be solved by adding the option "The embedded verb" to the drop-down menu under "Specified on:" on the Lexicon page:

Features: Name: mood Value: subjunctive Specified on: The embedded verb

I tried the two existing options in the mentioned drop-down menu that seem relevant for the case at hand: "On the object" and "On the verb". However, in both scenarios, the grammar failed to parse the examples above. Apparently, object in this menu is not the object complement clause, but an NP object (which by the way the questionnaire doesn't seem to support in addition to a clausal object, e.g. I told him that she's nice., see the 2019 paper first authored by @olzama). On the other hand, verb in this menu seems to refer to the main verb, not the embedded verb. To solve the problem, I first looked at the feature structures generated by the grammar for the dependent clauses of the examples. After figuring out how the mood of the dependent clause is specified in these structures, I created the types subj-cl-verb-lex and ind-cl-verb-lex (for verbs selecting subjunctive and indicative mood, respectively) with the questionnaire and modified them by hand in the following way:

subj-cl-verb-lex := fin-cl-verb-lex &  [ SYNSEM.LOCAL.CAT.VAL.COMPS < [ LOCAL.CONT.HOOK.INDEX.E.MOOD subjunctive  ] >].

ind-cl-verb-lex := que-cl-verb-lex & [ SYNSEM.LOCAL.CAT.VAL.COMPS < [ LOCAL.CONT.HOOK.INDEX.E.MOOD indicative  ] >].

Thanks to the audience of our summit talk, I was aware of mistakes in the first version of these definitions. These wrong definitions accidentally worked fine. The new definitions above were successfully tested. Presently, PorGram is only available in a private repository, but I can grant access to it to anyone interested on testing the grammar. PorGram will be soon openly accessible.

olzama commented 3 years ago

Hi @leoalenc ,

Thanks for such a detailed issue!

Glad you were able to get the se sentences work. As for the contrast between (2) and (3), you may be right that the questionnaire doesn't fully support this yet, although I will look into this in more detail on Monday. As I said in my talk earlier this week (see slide 12), modeling the distinction between clause-embedding verbs (e.g. verbs like think and wonder) remains a bit difficult, and not only at the level of the questionnaire but even at the level of individual grammars. The distinction between want and notice I think should be easier/better established, maybe. But not in the questionnaire (yet)! :) That being said, Dan has a solution even for think vs wonder in the ERG, and other or similar solutions can be implemented for other grammars, and maybe one day we will have a cross-linguistics solution for the Matrix. I will try to look into your specific example in more detail next week to make sure I understand the issue right and am not confusing it with something else.

With that being said, here's a potentially related thread here.

olzama commented 3 years ago

Hi @leoalenc ,

Actually, I didn't have a problem modeling the data that you give above. I have (1) and (2) parsing (with a single parse) and (3) and (4) ruled out. I also added a FORM (finite/nonfinite) distinction in addition to MOOD (indicative/subjunctive), but I did everything via the questionnaire. I attach the choices. Let me know if that helps of if there is some issue I didn't notice or understand correctly.

choices.txt

olzama commented 3 years ago

I tired without FORM, and I think it works as well. Here's choices without FORM, only MOOD. choices.txt

olzama commented 3 years ago

So, I think it sounds like you and I filled out the questionnaire differently somehow. See if you can spot the difference? (It's very easy to fill out the questionnaire in a way that was not intended, unfortunately! We try to document the proper use but it can be hard.)

leoalenc commented 3 years ago

Hi @leoalenc ,

Actually, I didn't have a problem modeling the data that you give above. I have (1) and (2) parsing (with a single parse) and (3) and (4) ruled out. I also added a FORM (finite/nonfinite) distinction in addition to MOOD (indicative/subjunctive), but I did everything via the questionnaire. I attach the choices. Let me know if that helps of if there is some issue I didn't notice or understand correctly.

choices.txt

Hi @olzama, thanks a lot for your effort in implementing the two minigrammars of Portuguese! I tried both grammars and the results coincided with yours. However, it seems that in neither choice file you attached the feature form=finite is associated with the complementizer que 'that', or perhaps I'm overlooking something? In my Portuguese grammar, some strategies for clausal complements require the verb to be in the infinitive. So I modified your minigrammar in the following way for both strategies with que 'that' (see choices03.txt):

You can put a FORM feature on the obligatory complementizer if you want to constrain the clausal verb in terms of which complementizers it can go with. (Note that all complementizers here are still assumed to be semantically empty.)

Form Value: finite

As a result, the grammar overgenerates:

True negatives 6 o gato perguntou que o cachorro dormia 0 27 7 o gato perguntou se o cachorro dormisse 0 25 8 o gato perguntou que o cachorro dormisse 0 27 True positives 1 o gato percebeu que o cachorro dormia 1 36 2 o gato queria que o cachorro dormisse 1 36 5 o gato perguntou se o cachorro dormia 1 35 False positives 3 o gato queria que o cachorro dormia 1 36 4 *o gato percebeu que o cachorro dormisse 1 36

As you can see, I also implemented embedded questions (I also made some additional small changes that are not relevant in the present context):

o gato perguntou se o cachorro dormia the cat ask:IND;PST;PFV;3SG if the dog sleep:IND;PST;IPFV;3SG

If I remove the form features from the specification of the complementizers, I get the correct results (see choices04.txt):

True negatives 3 o gato queria que o cachorro dormia 0 27 4 o gato percebeu que o cachorro dormisse 0 27 6 o gato perguntou que o cachorro dormia 0 27 7 o gato perguntou se o cachorro dormisse 0 25 8 *o gato perguntou que o cachorro dormisse 0 27 True positives 1 o gato percebeu que o cachorro dormia 1 36 2 o gato queria que o cachorro dormisse 1 36 5 o gato perguntou se o cachorro dormia 1 35

The conclusion seems to be that it is indeed possible to handle mood selection in clausal complements solely with the questionnaire. Thanks a lot for making me aware of that! The trick, however, is that one cannot associate a form feature with different variants of the same complementizer. I'll test this solution with my bigger Portuguese grammar where some clausal complementation strategies with both the non-inflected and the inflected infinitives are defined. I'll report the results here soon.

leoalenc commented 3 years ago

Hi @leoalenc ,

Thanks for such a detailed issue!

With that being said, here's a potentially related thread here.

@olzama, thanks for this link. The discussion there will be very relevant for my research.

olzama commented 3 years ago

Ah! The FORM feature on the complementizer is for a different purpose! Suppose you have a verb which only takes clausal complements that are headed by a complementizer que. You create a special FORM value, say you just call it "que", and you complementizer which is spelled que then is assigned this FORM value. Another complementizer, say se, will have a different FORM value (e.g. "se"; the names of these are not important so long as they are distinct). Then if a particular verb never takes se, you can model it using FORM. But it is not about the finiteness of the embedded clause; that you do via using the appropriate FORM values on the embedded verb. If there is an obligatory complementizer, then the customization system knows what to do and it constrains the complementizer's complement's FORM accordingly (inspect my first grammar, the actual portuguese.tld, and you'll see. (I am not sure at the moment what happens if the complementizer is optional though. If there is no complementizer, then the clause-embedding verb is what stipulates the constraint.)

leoalenc commented 3 years ago

Ah! The FORM feature on the complementizer is for a different purpose! Suppose you have a verb which only takes clausal complements that are headed by a complementizer que. You create a special FORM value, say you just call it "que", and you complementizer which is spelled que then is assigned this FORM value. Another complementizer, say se, will have a different FORM value (e.g. "se"; the names of these are not important so long as they are distinct). Then if a particular verb never takes se, you can model it using FORM. But it is not about the finiteness of the embedded clause; that you do via using the appropriate FORM values on the embedded verb. If there is an obligatory complementizer, then the customization system knows what to do and it constrains the complementizer's complement's FORM accordingly (inspect my first grammar, the actual portuguese.tld, and you'll see. (I am not sure at the moment what happens if the complementizer is optional though. If there is no complementizer, then the clause-embedding verb is what stipulates the constraint.)

@olzama, thanks for the explanation. I'm happy that the questionnaire can handle this syntactic phenomenon in Portuguese, which is also pervasive in French! It seems better in this first development stage of my Portuguese grammar that I explore all possibilities that the questionnaire has to offer, before hand coding the TDL! When I was filling out the questionnaire, I was mislead by my experience with BrGram and FrGram, my LFG grammars for Portuguese and French (the latter joint work with Christoph Schwarze), developed with the Xerox Linguistic Environment (XLE). In LFG, we have different FORM attributes, e.g., CFORM (or COMP-FORM) for complementizer form, PFORM for preposition form, VFORM for verb form and so on.
Maybe you should consider reformulating the questionnaire to prevent people from making the same mistake as myself. The FORM feature drop-down menu on the clausal complementation page offers "finite", "nonfinite", and "form" as possible values. This led me into choosing "finite" for que and se, which makes the grammar overgenerate. My suggestions:

1) changing FORM to CFORM (or COMP-FORM) or 2) adding an example to the explanation below:

You can put a FORM feature on the obligatory complementizer if you want to constrain the clausal verb in terms of which complementizers it can go with.

In either case, the user should be reminded that this FORM (better CFORM or COMP-FORM) feature should be defined on the "Other features" page.

olzama commented 3 years ago

OK for now, I tried to clarify the description by adding a couple notes. As for using a different feature, I think it's not impossible that that may be required in the future (especially if other approaches are doing it), but there would be some other evidence for that, not just readability/clarity. In other words, there would need to be a situation where we cannot use FORM for this purpose and need another feature (which we may then call CFORM). For now, FORM seems to work (not clashing with anything). We can reopen this issue if a clash occurs!

As for how to fill out the questionnaire correctly: to post such questions on Discourse may be beneficial because then it is more discoverable by others (along with the correct answer). If you like, you could even still post this question there, e.g. "What is the complementizer FORM for?" -- and just answer it yourself and mark the answer as correct :) -- or I can also do it. Use the category "Grammar Matrix". https://delphinqa.ling.washington.edu/

leoalenc commented 3 years ago

OK for now, I tried to clarify the description by adding a couple notes.

@olzama, I've seen the changes you made, they will be very helpful.

As for how to fill out the questionnaire correctly: to post such questions on Discourse may be beneficial because then it is more discoverable by others (along with the correct answer). If you like, you could even still post this question there, e.g. "What is the complementizer FORM for?" -- and just answer it yourself and mark the answer as correct :) -- or I can also do it. Use the category "Grammar Matrix". https://delphinqa.ling.washington.edu/

Thanks, @olzama. I'll follow your suggestions.