UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
201 stars 43 forks source link

Head of TIME + AM/PM #107

Open amir-zeldes opened 3 years ago

amir-zeldes commented 3 years ago

EWT has e.g. "2 pm" as:

nummod(pm,2)

GUM has the time as head and AM/PM as nmod:tmod:

nmod:tmod(2,pm)

Syntactically I think you can drop "pm", but not "2", and semantically it makes more sense to me that pm is an expansion on what kind of "2" this is. Is there a good argument for making "pm" the head? If not, should EWT be changed?

amir-zeldes commented 3 years ago

Examples:

http://match.grew.fr/?corpus=UD_English-EWT@2.7&custom=5fb53897d8095&eud=yes

nschneid commented 3 years ago

Agreed, am/pm (as well as time zones) would be better treated as dependents of the hour.

In the domain of value expressions I don't think there are any great options for the deprel. Not sure whether nmod:tmod (and obl:tmod) are intended to be restricted to times as a semantic role, or whether dependents within time expressions also deserve :tmod.

Another option would be flat, which the guidelines specify for month/date/year relations.

amir-zeldes commented 3 years ago

Forgot to answer this - this is actually kind of a big deal, since if we concede this case has the number as the head, then it's a chip in the widespread EWT tendency to treat numbers as nummod. If this achieves consensus, it opens the door for me to say again, why aren't dates headed by day numbers? I know I've ranted about this a lot, but bear with me, since I think I can make a morphosyntactic argument, rather than my usual semantic one:

Now let's try this for February 2nd, 2 February, etc. First we check the distribution of months vs. days:

Now the controversial cases:

If nummod is not automatically understood as the correct label for CD in a complex expression, then I think the reasoning for the handling of AM/PM should extend to dates, and we should accept that days are generally the heads of date expressions.

nschneid commented 3 years ago

I still think the number as head is a stronger case for am/pm because of optionality: you can simply omit am/pm without adding an article, unlike the ordinal interpreted as a monthless date. "on" selects semantically for dates, not months—so while "2nd" is arguably the semantic head of "February 2nd", it still seems plausible that MONTH+DATE is a syntactically headless construction (which we are forced to assign a formal head in UD, hence flat).

That said, I don't particularly like such a broad use of flat to cover personal names, which are completely fixed, as well as slots in a templatic date expression which may have limited internal structure (February 2nd and 3rd). I am thinking of proposing either a new universal relation or subtypes to deal with these miscellaneous nominal constructions which are not classic cases of nmod, nummod, compound, appos, or flat.

amir-zeldes commented 3 years ago

That sounds interesting, I'd also be interested in something like subtypes to deal with NP syntax too, but I'd want to ensure we don't diverge from TBs that won't do such expansions, and keep target labels for parsers manageable too. Maybe feat or misc expansions could do the trick? I'm happy to brain storm on proposals.

For the dates, if you go with flat you get the situation where "September 4" and "4 September" have different heads, so on top of being very different from names, that could lead to some surprising results in tools that care about heads. But TBH I can think of several arguments why the day should be head (explains preposition choice "on", semantically it is a "day", it allows both words to head their own respective phrases rather than "double duty" for "September", it's consistent with constructions in which the day is unambiguously the head like "the 4th of September" ...), and I don't really understand what would be a motivation not to make it the head? Just because you can't comfortably omit "September" in "September 4" doesn't mean the other word is not the head (otherwise "sandwich" is not the head of its NP in "I ate a sandwich", because we can't say "*I ate sandwich"). Is there any strong reason to prefer "September" as the head that I'm not seeing? And if so, is it strong enough to outweigh the considerations above?

nschneid commented 3 years ago

My point is that there are many syntactic formulations of dates, and while they are semantically equivalent that's not sufficient to assume they are syntactically equivalent. in/on select for the semantics of the NP, so it follows that if "4" is the semantic head, "September 4" requires "in" rather than "on"; I don't think this tells us anything about syntactic headedness, and of course there are other constructions where the syntactic head is not the semantic head ("a lot of cookies" comes to mind).

The relations we have are best equipped for "normal" NP syntax, where if the NP has multiple words the head is a noun, and if it is a singular count common noun it must have a determiner and so forth. With "the 4th of September" it's easy enough to say this evolved from eliding "day", so we promote "4th" to be the head, and "of September" is a normal PP modifier. With "4 September" and "September 4(th)" and "September the 4th" our usual understanding of NPs breaks down, it seems to me, so we need to do something special one way or another. (And likewise for minor or archaic constructions like "Richard the Lionheart".)

Let me try to formulate a syntactic argument that "4th" in "September 4th" is not a head, or at least not a "normal" head: If "September" was a modifier, shouldn't it be expected to license its own modifiers in between "September" and "4th"? I don't think those are natural ways to express a date:

  1. the 4th of [September of last year]
  2. *[September of last year] 4th
  3. September next (old-fashioned alternative to "next September")
  4. [September 4th] next
  5. the 4th of [September next]
  6. *[September next] 4th

It's a bit tricky because semantically a temporal modifier that could applies to the month could usually apply to the date as well, since dates are contained within months. We can try adjectives modifying the month specifically e.g.

  1. the 4th of fair September
  2. [fair September] 4th (fair September 4th has to mean the day is fair, not the month*)
  3. *4 fair/next September

Maybe this tells us only that the month-date construction doesn't allow internal modification of the month. In any case, it sure doesn't feel to me that "September" is modifying "4th" syntactically. I suppose others feel the same way which is why the guidelines currently prescribe flat. But if others want to go with that analysis for semantic convenience I could live with it.

amir-zeldes commented 3 years ago

That's interesting - it actually does feel like modification to me (I read "September 5" as something like an elliptical possessive "September('s) 5(th day)" and "5 September" as "5(th of) September". That must be the source of our disagreement... In fact, "5 September" said out loud is ungrammatical for me, I'd prob. read it "September fifth", like I read "$500" as "five hundred dollars". I think the possessive reading of Sep. 5 and problems omitting Sep. are similar to normal possessives: the NP in "John's dog barked" is headed by "dog", despite the fact we can't say "*dog barked".

I also think the "on" government argument is a morphosyntactic one and not a semantic one, since comparable semantics does not guarantee comparable government ("at sea" but "in the ocean", "wait for someone" but "await someone").

As for the modifiers for September, I think they are possible, but due to heavy phonological weight of the phrase, they're postponed (in transformational terms, "*September of last year 4th" ->"September t 4th of last year"), and in any case as you point out, hard to distinguish from the entire "September 4th" getting that modifier. In fact, we can use phrase syntax for another argument for days as heads: if months are the head, then they project both a month and a day phrase, which seems odd to me. Compare:

(September (fourth)) ((September) fourth))

The day analysis seems easier to reconcile with compositional phrases (even if compositionality is a semantic fact, I think syntax should map onto semantics if possible). In the second analysis, we can explain compositionality by "fourth" projecting a phrase corresponding to a day, and leave "September" to project a subordinate "month" phrase, which is semantically consistent.

Finally I don't know about the guidelines, but EWT in practice doesn't use flat, it has nummod from month to day (in either order), and this has bothered me for a while. If AM/PM are accepted to mean "3 PM" is a modified type of 3, I think treating day+month the same is more consistent: there are two kinds of 3 (the PM and the AM, distinguished by those modifiers, which just stand next to the hour number), and there are 12 kinds of 4th day of the month, again modified by the appropriate month, with the same syntax (just standing next to the day number). In any case, I'm only talking about the guidelines for English, as it's very possible there are languages with other distributions that clearly favor some other analysis (I definitely don't think dates are flat cross-linguistically).

nschneid commented 3 years ago

Re: possessives—I think you're saying that September in September 4 has a determinative function (acts as a specifier in lieu of an determiner). Currently determinatives are covered by det, nmod:poss, and quantity nummod. ("You" in "you guys" is arguably one as well but that is a very specialized construction: amir-zeldes/gum#71.) Perhaps you're right that "September" is best viewed as determinative, but I'm not sure there's a good way to express that in UD without creating a new deprel. It feels wrong to call it compound, for instance, even though those are usually right-headed.

OTOH positing a new determinative construction absent clearer syntactic evidence of headedness may be forcing it if we could simply say the month-date construction is special and lacks a normal head.

I also think the "on" government argument is a morphosyntactic one and not a semantic one, since comparable semantics does not guarantee comparable government ("at sea" but "in the ocean", "wait for someone" but "await someone").

You can also say "in the sea". "At sea/*at ocean" is a multiword expression phenomenon (determinerless PP); so is "wait for" (prepositional verb). Yes there are many many idiosyncrasies in prepositions within multiword expressions, but "in"-month vs. "on"-day is quite productive based on semantics (we say not only "on September 4", but also "on Tuesday", "on Yom Kippur", and "on the third day of the week"). If we were abbreviating the date we would write "on 9/4" but "in 9/2020". This can all be explained by saying that the distribution of prepositions is sensitive to the semantic type of the temporal expression; whether the semantic head bearing that type lexically is also the syntactic head is another question.

Overall, at this point I think we should hear what other people think.

nschneid commented 3 years ago

Continuing on the determinative idea: FWIW, looking at the potential for "the", demonstratives, and quantifiers, it appears months pattern more like quantity modifiers than possessives, and match the distribution of month entities:

the this/these each all
*the September 4 (but: the September 4th that we met) this September 4 each September 4
*the September (but: the September that we met) this September each September
the 1 car; the 5 cars this 1 car; these 5 cars each 1 car all 5 cars
*the my car *this my car *each my car all my cars
*the you guys *these you guys all you guys

I don't think this makes a convincing case one way or another about headedness, but it calls into question whether "September" should be considered determinative.

manning commented 3 years ago

What to do with dates is a really difficult issues with seemingly conflicting facts in different languages. There was thoughts of having a multilingual UD study of this issue. I won't weigh in here.

But what you suggest for "2 am" does seem reasonable, and so we could change EWT to do things like you do in GUM!

amir-zeldes commented 3 years ago

@manning - thanks, this is now consistently nmod:tmod in GUM, so at least these two corpora would be identical in this respect.

nschneid commented 2 years ago

Related: UniversalDependencies/docs#893