Closed lgessler closed 2 years ago
Removed 71 instances with NONSNACS labels. More updates on this can be found in #31. Closing this if nothing else.
@nschneid did you want anything in particular to be done with these? I think I remember you saying you ideally wanted them to be annotated somehow instead of just blanked out.
edit: wait, so if I read the other issue right Nitin, did the 71 NONSNACS labels get replaced with SNACS labels?
No. They are blanked out. This resolves some of the "Invalid supersense(s) in lexical entry" issues with the validator.
OK, so i did a quick analysis, have attached an excel sheet with some reasons why we decided to mark these NONSNACS. If any of these reasons merits a new SNACS label, do let us know. The analysis is in the 'analysis' tab.
@aryamanarora , there are a couple of cases that I think may not be NONSNACS, we will need to revisit: 1) lp_hi_10_74 - 'ke layak' 2) lp_hi_2_40 - 'to' - possibly FOCUS. 3) 'vaalaa' in 'teach-ER' like constructions. In the LPP this is mostly 'batti vaalaa' or 'lamp-light-ER'. There is a definition that translates to "a suffix denoting an agent, doer, owner, possessor, keeper or inhabitant". It's just that i'm not sure if the vaalaa is present in a governor-object construction here, it may just be marking the lamp as something being 'kept' by the batti vaalaa and denoting someone responsible for the lamps. If we think the predicate itself (to keep? own? light? ) can be implied in this 'batti vaala' construction, then it probably needs a SNACS label? Otherwise, i don't know.
If you could also take a look at the sheet to see if my analysis makes sense :).
'vaalaa' in 'teach-ER' like constructions. In the LPP this is mostly 'batti vaalaa' or 'lamp-light-ER'. There is a definition that translates to "a suffix denoting an agent, doer, owner, possessor, keeper or inhabitant". It's just that i'm not sure if the vaalaa is present in a governor-object construction here, it may just be marking the lamp as something being 'kept' by the batti vaalaa and denoting someone responsible for the lamps. If we think the predicate itself (to keep? own? light? ) can be implied in this 'batti vaala' construction, then it probably needs a SNACS label? Otherwise, i don't know.
Propose Characteristic ~ Identity for such cases of vaalaa.
Are you sure it is a postposition and not some other kind of suffixal morpheme?
It's not a suffixal morpheme, it's a token in its own right. But i'm not sure if it qualifies to be a postposition in the SNACS sense. The sense i got from the English guidelines is that a postposition is between a governor (usually a predicate) and an object ( argument of predicate).
The 'batti vaalaa' is a bit more complicated, because it literally translates to 'lamp vaalaa' . But a speaker of Hindi would automatically imply that a) the term 'batti vaala' is referring to a person in the world, and b) That person is associated with the lamp in some capacity. The capacity can be implicit , as in the example 'batti vaala' or it can be explicit, as in 'batti jalaane vaala' literally [lamp light.INF vaalaa] where light is the predicate 'to light'. Maybe in english, this is: "The vaala who lights the lamp".
I guess my confidence that it's not a suffix (as Chaturvedi claims) is that it can come after both noun and verb tokens and end up meaning the same thing [like 'batti vaala' == 'batti jalaane vaala']. Both terms refer to the same person in the world (and maybe, have the same part-of-speech?)
@nschneid in case it's relevant, it might be helpful to note that this vālā morph also uses standard adjective inflections. vālā covers masculine singular nominative, but it would be e.g. vāle for masculine singular oblique:
battī vāl -e ko batāo ...
lamp VALA-M.SG.OBL DAT tell ...
'tell the lampman ...'
My 2c: I don't have a strong opinion at the moment but it's worth considering whether only some uses of vālā should be considered postpositional and/or SNACS-y, as I think you already have.
@nitinvwaran do you have a link to the Chaturvedi paper you're referring to?
@nitinvwaran do you have a link to the Chaturvedi paper you're referring to?
It's not a paper, it's a dictionary entry written by Chaturvedi...
I did find some Phrase Structure Annotation Guidelines which mentions the following:
@nschneid in case it's relevant, it might be helpful to note that this vālā morph also uses standard adjective inflections. vālā covers masculine singular nominative, but it would be e.g. vāle for masculine singular oblique:
This is interesting because i guess it's considering the 'batti vaalaa' as a noun and then converting to oblique form before introducing the post-position ko, This kind of happens with pronouns too (main [nom] -> mujh [obl] before ko, ending with mujhko, there is a nice table in Kachru's grammar book explaining this) and nouns too have an 'oblique' case as opposed to a 'direct case' (some references from Koul's grammar book here). But i don't know now how to reconcile the author's claim that vaalaa is tagged ADP, with the inflection on this token...!
I think there are three senses of vālā: one as a derivational nominal suffix (which is not in SNACS scope, should be UD NOUN
or not tokenized separately), one as a verbal auxiliary indicating the prospective aspect ("about to V", also not in SNACS scope, should be AUX
or not tokenized separately) and one as a postposition. We should annotate the last one with something construed as Characteristic.
The best evidence for treating this type of vālā as a postposition is that it has similar properties to the genitive kā and the comparison words jaisā, sā. All of these decline in gender/number (like Adjs) to agree with the governor of the PP they project, but they also indicate semantic relations like a PP and obligatorily take a nominal argument like others Ps. Also, semantically vālā is like the inverse of the genitive kā, e.g. "hat vālā man" (the man with a hat) ~ "man kā hat" (the hat of the man).
OK, the Phrase structure guidelines for the Hindi treebank seem to also think vaalaa has an aspectual meaning (TAM) in some cases (case III in the treebank guidelines). The good news is that the conllulex file seems to have these tagged AUX indeed and these passed validation.
And the other cases (case I and case II) of vaalaa in the conllulex file are tagged ADP. The most common case - batti vaalaa - is tagged batti / NOUN vaalaa / ADP. So I'm not sure if we can ignore the nominal 'suffix' as not adpositional. I would label all the remaining non AUX cases as Characteristic ~ Identity.
An aside that might be relevant here: how is about as in She was about to go analyzed in English? Here we also have something that prima facie looks like an adposition but semantically is clearly an aspectual auxiliary, just line the verbal aspectual sense of vālā. Whatever the answer, we should be consistent with the aspectual use of vālā
In STREUSLE we say about_to has a lexcat of AUX. Definitely not an adposition.
@nitinvwaran Why are you thinking Characteristic~Identity? In "hat vālā man" (the man with a hat), is it presenting some sort of equivalence between the hat and the man?
In STREUSLE we say about_to has a lexcat of AUX. Definitely not an adposition.
@nitinvwaran Why are you thinking Characteristic~Identity? In "hat vālā man" (the man with a hat), is it presenting some sort of equivalence between the hat and the man?
Hmm. Its more like vaalaa is describing not just any Characteristic, but the Chief Characteristic of the man. Out of all the descriptors i could pick for the man (the short man, the fat man, the skinny man, rich man) I instead use "hat vaalaa man" and then continue to refer to the man in subsequent conversation as the 'hat vaalaa man". Maybe this could be translated as "the man wearing a hat" , which is something of a restricted relative clause? Without the "wearing a hat", he's just another man. Is it fair to say it's a possible [Identity] because it could be construed as a restricted relative clause? I don't think it's just 'the man with a hat' - referring here to McGregor's dictionary entry , टोपीवाला लड़का, m. the boy wearing a hat.
For cases where vaalaa is like a 'suffix' , these are pretty much teach-er constructions. A 'chaai vaalaa' literally means 'tea vaalaa' or maybe, 'tea man' but on an Indian railway train the term is ubiquitously understood as someone (a professional) who serves tea to passengers on the train. Possibly outside the train too...in this case the chaai vaalaa is understood as a street vendor who serves tea. batti vaalaa is also 'lampman' , maybe understood in a 'professional' sense.
Is it like if you said "the hat man"? That would be plain Characteristic I think. Restrictiveness/referential uniqueness are not part of the SNACS criteria.
'chaai vaalaa': I don't see how this usage would be an adposition as defined in SNACS, where there is (usually) a semantic relation between two elements. This seems like some sort of complex/derived nominal. Maybe the treebank used P for miscellaneous elements that attach to nouns, but we don't have to adhere to that for lexcat.
Is it like if you said "the hat man"? That would be plain Characteristic I think. Restrictiveness/referential uniqueness are not part of the SNACS criteria.
OK.
'chaai vaalaa': I don't see how this usage would be an adposition as defined in SNACS, where there is (usually) a semantic relation between two elements. This seems like some sort of complex/derived nominal. Maybe the treebank used P for miscellaneous elements that attach to nouns, but we don't have to adhere to that for lexcat.
A follow up question to this is what to do with the second case below, in the Hindi LPP corpus:
vaalaa in the second case immediately follows the verb predicate, but the two terms 'batti vaalaa' and 'batti jalaane vaala' may be equivalent. I think both could be the complex/derived nominal. But would you say in the second case, vaalaa is mediating between 'batti' and 'jalaane'
Could 'batti jalaane' be a compound? Like "lamp-lighting man".
Interesting Nitin, so your parse of battī jalāne vālā is [battī [jalāne vālā]]? (I gather this is what you mean when you say "mediating between 'batti' and 'jalaane'".)
I'd always assumed something more like what Nathan's saying, that it's a compound: [[battī jalāne] vālā]. I can't think of any conclusive ways to distinguish off the top of my head, but intuitively it'd be unusual if the verb did not first combine with its argument.
Hmm, i hadn't thought deeply about how the phrases internally combine :) - But I did look at some UD Hindi treebanks. I found this example below interesting because the vaala follows another verb predicate 'lagnaa' but i don't think the verb is in a compound construction. I think its argument is actually implied / missing. I highlighted the verb predicate + vaalaa in the text:
This one is where the argument is explicit (skirt) of the predicate (pehenna) followed by vaalaa. Seems to be obj relation, not compound.
Right, I think you're right that the verbal clause marked by vālā has considerable freedom in how/whether it expresses its arguments and adjuncts. I think Nathan was talking about the way the whole VP vālā expression is joined to its head when he said "compound" though, instead of relations within the VP vālā expression?
Hmm, in that case, the whole VP expression (including the vaalaa token) seems to be amod to the noun that it is modifying (sampling some more examples from the treebank). The vaalaa token itself seems to be always annotated as 'mark'. Maybe, it is mediating instead between the whole VP phrase and the noun being modified? I'm not sure how to analyze this type of vaalaa from a SNACS perspective.
Maybe, it is mediating instead between the whole VP phrase and the noun being modified?
Right, syntactically, this would be my instinct. I think the UD analysis there is basically right as far as attachments go, but I think what's in question is whether the deprel for the verbal clause (e.g. पहनने in the latter sentence) ought to be compound
or amod
.
(A UD aside: actually, amod
seems very wrong--"shirt pahanne vālā" seems clausal, but amod
is only for non-clausal modifiers, so it should be acl
at least, I think.)
Right, syntactically, this would be my instinct. I think the UD analysis there is basically right as far as attachments go, but I think what's in question is whether the deprel for the verbal clause (e.g. पहनने in the latter sentence) ought to be
compound
oramod
.(A UD aside: actually,
amod
seems very wrong--"shirt pahanne vālā" seems clausal, butamod
is only for non-clausal modifiers, so it should beacl
at least, I think.)
I thought you might find these intresting, i've attached snippets from the Hindi Grammar books. The annotation guidelines for the Hindi treebank also say something similar:
This one from Omkar Koul's book -> vaalaa as forming adjectives:
This one from Yamuna Kachru's book:
"adjectives such as bəndərvala 'one with a pet monkey'...": What makes this an adjective rather than a noun?
Well, this is an example from the Internet as Corpus. The Hindi caption loosely translates to: 'Tyrolean hat and bəndərvala boy". The vaalaa is converting the noun 'monkey' to an adjective..the phrase annotation guidelines mention this, as does Koul's book.
And this is the English translation:
The earlier example too, टोपीवाला लड़का, m. the boy wearing a hat. - topivala modifies the boy as adjective.
"adjectives such as bəndərvala 'one with a pet monkey'...": What makes this an adjective rather than a noun?
I think Kachru is being loose with terminology here, and also revealing that she conceives of uses of vālā like in bandar vālā as AdjPs.
We've already covered some reasons why regular adjectives and some vālā phrases are alike (both inflect and can modify nominal heads). Another reason to think so is that regular adjectives in Hindi can serve as what classical grammarians would call substantive adjectives, where an adjective is the head of a constituent where you would normally expect an NP, e.g. English The rich/JJ lack little, or Hindi (example from the internet):
daulat kī is duniyā mɛ̃, g̣arīb hī mār khātā hɛ̃
wealth GEN this.OBL world in poor FOC beating eat.IPFV are
'In this world of plenty, only the poor are trampled'
Note how an adjective (g̣arīb) is serving as the subject of the second clause. This is probably one reason why Kachru is analogizing expressions like bandar vālā to adjectives like g̣arīb: both can be syntactically "promoted" to serve as the sole exponents of NPs.
I think it'd be helpful to give some examples highlighting the different _vālā_s we're talking about, differentiated by syntactic criteria:
(1) Aspectual
ṭren jāne vālī hɛ
train go.INF VALA is
'The train is about to leave'
(2a) Internal noun phrase, modifies nominal head
battī vālā ādmī chalā gayā
lamp VALA man walk went
'The lamp-one man went away'
(2b) Internal noun phrase, serves as nominal phrase
battī vālā chalā gayā
lamp VALA walk went
'The lamp-one went away'
(3a) Internal verb phrase, modifies nominal head
battī jalāne vālā ādmī chalā gayā
lamp burn.INF VALA man walk went
'The lamp-burning-one man went away'
(3b) Internal verb phrase, serves as nominal phrase
battī jalāne vālā chalā gayā
lamp burn.INF VALA walk went
'The lamp-burning-one went away'
Is this exhaustive wrt. the syntactic criteria we've been discussing? (And are all of these examples idiomatic, native Hindi speakers?)
Is this exhaustive wrt. the syntactic criteria we've been discussing? (And are all of these examples idiomatic, native Hindi speakers?)
I'm not a fully native Hindi speaker, but it looks exhaustive to me. I'm not certain that 3.b is idiomatic, but maybe Aryaman can comment further.
@lgessler no, you can also have an internal AdjP (nīlā vālā (ādmī)
"(the man) who is blue") or an internal PP or intransitive AdvP ((mere ghar ke) pīche vālī (saṛak)
"(the road) that's behind (my house)"). Of course, you could argue that these project NPs and serve as some kind of fused-head, like in English "the poor" etc.
Also, as much as I love Kachru's grammar the existing descriptive grammars of Hindi are generally very poor at devising a modern syntactic analysis of these kinds of complex phenomena. These vālā
phrases are absolutely not AdjPs, they take complements in a far too permissive way to be called that. The only formal similarity with AdjPs is the gender/number agreement but other PPs do that too. Many others Ps in Hindi also behave subordinator-like and can take whole VPs as complements.
Hmm, i hadn't thought deeply about how the phrases internally combine :) - But I did look at some UD Hindi treebanks. I found this example below interesting because the vaala follows another verb predicate 'lagnaa' but i don't think the verb is in a compound construction. I think its argument is actually implied / missing. I highlighted the verb predicate + vaalaa in the text:
Here it is [[[यहाँ लगने]VP वाला]PP [[तीन दिन]NP का]PP इज़्तिमा]NP "the three-day congregation that will be held here". Takes the VP as a complement, and like a relative clause it is a gapped subject.
no, you can also have an internal AdjP (nīlā vālā (ādmī) "(the man) who is blue") or an internal PP or intransitive AdvP ((mere ghar ke) pīche vālī (saṛak) "(the road) that's behind (my house)").
Ah that's right, hmm... Maybe we should instead distinguish the vālā's complement as clausal or non-clausal, if it's an important one to make.
To the matter at hand though, is this the consensus that's emerging?
2. vālā phrase serving as NP: non-SNACS, with the reason being that this is derivational morphology
vaalaa is pretty much mostly tagged ADP in the treebank. Can derivational morphology get a POS tag? Also, I'm guessing this applies only to 2.b, but what then semantically differentiates 2.b from 2.a, which as a modifier of NP does get a label? To me, both 'batti vaalaa' and 'batti vaalaa aadmi' mean the same thing.
- vālā phrase serving as modifier of an NP: SNACS, with the reason being that vālā is mediating a relation between its object (the body of the vālā phrase) and its governor (the noun it is modifying)
Agreed, for both the 'internal noun' and 'internal verb' types.
Note how an adjective (gharīb) is serving as the subject of the second clause. This is probably one reason why Kachru is analogizing expressions like bandar vālā to adjectives like gharīb: both can be syntactically "promoted" to serve as the sole exponents of NPs.
Actually, bringing this into the 2.a versus 2.b debate, if we say batti vaalaa can replace (substantiatively) batti vaalaa aadmi, that would probably resolve my concern about them being the same thing. But then, how would this substantiative analysis affect the SNACS analysis?
I'm not intimately familiar with SNACS but I think what is and isn't a target is mostly defined by formal, rather than semantic, criteria right? So regardless of how similar battī vālā ādmī and battī vālā in meaning, the fact that they're distinct constructions means we need to consider them both separately.
So I think our last thing to settle is whether (2b,3b) type vālā phrases are in scope for SNACS, right? @nschneid have there been situations in other languages where an adposition lacks a (overt?) governor? I can only think of situations where it lacks an object. Here are 2b and 3b again:
(2b) Internal noun phrase, serves as nominal phrase
battī vālā chalā gayā
lamp VALA walk went
'The lamp-one went away'
(3b) Internal verb phrase, serves as nominal phrase
battī jalāne vālā chalā gayā
lamp burn.INF VALA walk went
'The lamp-burning-one went away'
I'm not intimately familiar with SNACS but I think what is and isn't a target is mostly defined by formal, rather than semantic, criteria right? So regardless of how similar battī vālā ādmī and battī vālā in meaning, the fact that they're distinct constructions means we need to consider them both separately.
Or, we could consider them together under the substantiative adjective analysis that you proposed, which I agree with. We could also consider whether a multi-word substantiative adjective (batti vaalaa) that contains an ADP token (vaalaa) could get a SNACS label for the ADP token.
@nschneid have there been situations in other languages where an adposition lacks a (overt?) governor?
With the substantiative adjective analysis, the governor is present but probably implied. Where the governor in 'batti vaalaa aadmi' is 'aadmi', the adjective 'batti vaalaa' now steps in for the full noun phrase. So 'aadmi' could be thought of as the (absent) governor of the 'vaalaa' token in the substantiative adjective.
have there been situations in other languages where an adposition lacks a (overt?) governor?
None that I can think of. We have things like Approximators in English which actually sort of act like modifiers, but I wouldn't say they lack a governor.
Maybe I'm just not familiar enough with Hindi to get the right intuition but a marker that derives an adjective (substantive or otherwise) from a noun doesn't seem like an adposition to me. The English analogues I can think of are suffixes like -ful (plenty → plentiful). If Hindi has just one word that can be either a postposition or a noun-to-adj deriver, I don't see a problem with assigning a different lexcat to the latter.
Maybe I'm just not familiar enough with Hindi to get the right intuition but a marker that derives an adjective (substantive or otherwise) from a noun doesn't seem like an adposition to me. The English analogues I can think of are suffixes like -ful (plenty → plentiful). If Hindi has just one word that can be either a postposition or a noun-to-adj deriver, I don't see a problem with assigning a different lexcat to the latter.
If that's the thinking - not considering vaalaa as adpositional - then my thinking is that this needs to apply to both 2.a and 2.b, and 3.a and 3.b, or in other words we shouldn't be considering vaalaa as adpositional in a SNACS sense in the entire corpus. 2.a and 3.a would also derive adjectives and amod phrases (the latter evidenced in the Hindi UD treebanks) from nominals and verb phrases respectively, which may not be in the spirit of being adpositional.
From a validation perspective because vaalaa is (mostly) tagged ADP in the entire corpus, I would need to create a specific exception for this, or maybe a specific lexcat (P.VAALAA)?
Or call it PART
?
Maybe I'm just not familiar enough with Hindi to get the right intuition but a marker that derives an adjective (substantive or otherwise) from a noun doesn't seem like an adposition to me.
Would be curious for @aryamanarora's opinion here, given that at least in a narrow sense he thinks it's a bad analysis to call vālā phrases "adjectives"
PART works!
I don't have any more questions here, thanks everyone. My conclusion is to treat all cases of vaalaa as non-SNACS. I don't think a differential treatment for certain cases with an explicit governor, is consistent. If no objections....
@aryamanarora feels strongly that (2a) is postpositional.
IMO if there's a nominal governing the vālā-phrase it's a SNACS target. There's nothing else it can be analysed as in that context but a postposition. PART
feels like an (unnecessary) cop-out there; for the noun suffix thing perhaps that's okay.
I'm OK either way (postposition or no), but i think the treatment should be consistent between 2.a and 2.b (and 3.a and 3.b) - either both cases are postpositional, or not. To my mind both cases are deriving adjectives (or perhaps, there's a compound relationship between the batti and the vaalaa?) and so are similar syntactically, and they can both refer to the same entity in the world. So treating them differently would be an inconsistency i think.
For what my non-native opinion's worth, I'm inclined to agree with Nitin that consistency within (2) and (3) seems right.
Would it be too difficult to try to set aside a half-hour to talk this out on Zoom in the next couple weeks?
My thoughts on this:
(1) Aspectual ṭren jāne vālī hɛ train go.INF VALA is 'The train is about to leave'
Not adposition
(2a) Internal noun phrase, modifies nominal head battī vālā ādmī chalā gayā lamp VALA man walk went 'The lamp-one man went away'
Adposition
(2b) Internal noun phrase, serves as nominal phrase battī vālā chalā gayā lamp VALA walk went 'The lamp-one went away'
Adposition, but the governor is implied.
(3a) Internal verb phrase, modifies nominal head battī jalāne vālā ādmī chalā gayā lamp burn.INF VALA man walk went 'The lamp-burning-one man went away'
Adposition
(3b) Internal verb phrase, serves as nominal phrase battī jalāne vālā chalā gayā lamp burn.INF VALA walk went 'The lamp-burning-one went away'
Adposition, but the governor is implied.
When you say "governor is implied" I assume you mean there is an implicit semantic governor, and syntactically, the PP behaves like an NP (some sort of coercion)? Can "battī vālā" be pluralized or case-marked?
I guess we technically have something similar in English with named entities that take the form of PPs ("Of Mice and Men is a novel"). I don't think English has a preposition with a well-established sense for deriving NPs, though.
Matrix-licensed complements (carmls/snacs-guidelines#148) are perhaps broadly similar in being another case of syntax-semantics mismatch.
Can "battī vālā" be pluralized or case-marked?
Yes to both. Luke had given some examples of case earlier in the thread, where the 'battī vālā' receives oblique case when marked by a post-position 'ko'.
If it's post-positional, should we go ahead with 'Characteristic'?
@nschneid in case it's relevant, it might be helpful to note that this vālā morph also uses standard adjective inflections. vālā covers masculine singular nominative, but it would be e.g. vāle for masculine singular oblique:
battī vāl -e ko batāo ... lamp VALA-M.SG.OBL DAT tell ... 'tell the lampman ...'
OK, so for those advocating treating (2b) and (3b) as adpositions, would the same analysis hold above? Would it be the sole adposition that can be suffixed with -e (M.SG.OBL)?
Would it be the sole adposition that can be suffixed with -e (M.SG.OBL)?
From Aryaman above:
The best evidence for treating this type of vālā as a postposition is that it has similar properties to the genitive kā and the comparison words jaisā, sā. All of these decline in gender/number (like Adjs) to agree with the governor of the PP they project, but they also indicate semantic relations like a PP and obligatorily take a nominal argument like others Ps. Also, semantically vālā is like the inverse of the genitive kā, e.g. "hat vālā man" (the man with a hat) ~ "man kā hat" (the hat of the man).
The validator expects the supersense columns to be blank in this case.