Closed AngledLuffa closed 10 months ago
This is an error in GUM, right? I've always understood the English articles to be restricted to "a(n)" and "the", and that's how it is in EWT and the PronType
guidelines.
Cunningham's law strikes again! That possibility was why I tagged Amir, at least
Well, if the guidelines say so then we have to either change GUM or the guidelines... I'd prefer it to have a PronType because it's really just a fusion of the same "an" we tag as having that feature, and the adjective other.
Since we tag and deprel it DT/det, and not amod, I would expect it's supposed to match the behavior of the "an" component, but if others see it differently, I'm willing to copy the EWT behavior.
Historically it is "an"+"other", but "another" as a whole functions differently. (For example, it can take "yet" as an advmod
, which articles cannot.)
While we're at it I see GUM has PronType=Art
for "both", "no", "(n)either", and "yonder" (query). I would change those as well. The guidelines suggest PronType=Tot
for "both" and PronType=Neg
for "no".
OK, so Neg for no, Tot for both, and nothing for the rest? Maybe also neg for neither and Dem for yonder?
Yeah, Dem
for "yonder" in its det
usage makes sense to me. (If we wanted to decouple the det
function from UPOS, like we do for some other deprels, arguably "yonder" is an ADV and maybe we'd want to drop the PronType. But that would be a separate discussion; let's keep DET for now.)
In principle there could be values that cover {"either", "neither"} and "another". It doesn't seem we have those at present (but see UniversalDependencies/docs#732), so I'm fine with Neg
for "neither" and blank for "either" and "another".
Tagging @dan-zeman in case he wants to weigh in.
I do like the idea of them having some kind of feature on them, so if there isn't currently an appropriate feature for "another", perhaps we could add one
On Sun, Aug 20, 2023 at 3:54 PM Nathan Schneider @.***> wrote:
Yeah, Dem for "yonder" in its det usage makes sense to me. (If we wanted to decouple the det function from UPOS, like we do for some other deprels, arguably "yonder" is an ADV and maybe we'd want to drop the PronType. But that would be a separate discussion; let's keep DET for now.)
In principle there could be values that cover {"either", "neither"} and "another". It doesn't seem we have those at present (but see UniversalDependencies/docs#732 https://github.com/UniversalDependencies/docs/issues/732), so I'm fine with Neg for "neither" and blank for "either" and "another".
Tagging @dan-zeman https://github.com/dan-zeman in case he wants to weigh in.
— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_English-EWT/issues/416#issuecomment-1685417565, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWNHQYWKO5MFWXTCZ4DXWKIRDANCNFSM6AAAAAA3XOC5HE . You are receiving this because you authored the thread.Message ID: @.***>
If you want to make that happen I think the way would be to open an issue on the docs repo, and include a table of all determiners with their proposed features (along the lines of https://universaldependencies.org/en/pos/PRON.html).
But that will take some discussion—in the meantime we can just use the features we have.
and blank for "either" and "another"
I would use PronType=Ind
for these two. Indefinite is sometimes used as a 'catch-the-rest' category.
I had posted an issue which could be used for building a standard
https://github.com/UniversalDependencies/docs/issues/971
Any thoughts on things such as either
or another
, such as @dan-zeman 's suggestion of PronType=Ind
? There are others which might fit that, such as any
or every
Here's what we converged on in the other thread: https://universaldependencies.org/en/pos/DET.html
@AngledLuffa PRs to implement this welcome!
Thanks for documenting – added udver 2
@AngledLuffa any interest in implementing this? Would be great to have for the UD 2.13 release (deadline Nov. 1).
You have no idea how much of a PITA it's been trying to get Ssurgeon to support empty nodes :/
but I'm almost to point where simple edits to node features are possible, I think
CoreNLP didn't support empty nodes at all in the graph objects used for SemanticGraph
Stanza couldn't read or write those nodes either, it just always discarded them
Both of those are now fixed. CoreNLP still can't read or write empty nodes, but I'm just skipping that for now... still need to make it so that Ssurgeon can understand two graphs at once
I realized I should add these checks to my validation script and went ahead and added the features with some regex replacements.
LGTM, thanks. @amir-zeldes something similar for GUM etc? I'll take a look at PUD and the Pronouns datasets
Yes, it's on my list to implement the feature proposal from the table before the upcoming release, not done yet though.
In PUD, there are a few lines of that
which are not as the new table:
19 that that DET WDT PronType=Rel 22 obj 18:ref _
25 that that DET WDT PronType=Rel 27 obj 24:ref _
16 that that DET WDT PronType=Rel 20 obj 15:ref _
A larger context looks like this:
16 the the DET DT Definite=Def|PronType=Art 18 det 18:det _
17 last last ADJ JJ Degree=Pos 18 amod 18:amod _
18 thing thing NOUN NN Number=Sing 2 parataxis 2:parataxis|22:obl _
19 that that DET WDT PronType=Rel 22 obj 18:ref _
20 the the DET DT Definite=Def|PronType=Art 21 det 21:det _
21 Government government NOUN NN Number=Sing 22 nsubj 22:nsubj _
22 wants want VERB VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 18 acl:relcl 18:acl:relcl SpaceAfter=No
Is it still Number=Sing
if it's in a WDT
context instead of a DT
context?
Similarly, should half a million
get the updated half
features?
-11 half half DET PDT _ 13 compound 13:compound _
+11 half half DET PDT NumForm=Word|NumType=Frac|PronType=Ind 13 compound 13:compound _
In PUD, there are a few lines of
that
which are not as the new table:
If that is relative it should be PRON not DET.
Similarly, should
half a million
get the updatedhalf
features?
Yes, that's half as PDT/DET.
If that is relative it should be PRON not DET.
So these that
should be PRON and not DET?
16 the the DET DT Definite=Def|PronType=Art 18 det 18:det _
17 last last ADJ JJ Degree=Pos 18 amod 18:amod _
18 thing thing NOUN NN Number=Sing 2 parataxis 2:parataxis|22:obl _
19 that that DET WDT PronType=Rel 22 obj 18:ref _
20 the the DET DT Definite=Def|PronType=Art 21 det 21:det _
21 Government government NOUN NN Number=Sing 22 nsubj 22:nsubj _
22 wants want VERB VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 18 acl:relcl 18:acl:relcl SpaceAfter=No
23 a a DET DT Definite=Ind|PronType=Art 24 det 24:det _
24 producer producer NOUN NN Number=Sing 20 appos 20:appos|27:obl _
25 that that DET WDT PronType=Rel 27 obj 24:ref _
26 she she PRON PRP Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs 27 nsubj 27:nsubj _
27 admired admire VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 24 acl:relcl 24:acl:relcl SpaceAfter=No
13 of of ADP IN _ 15 case 15:case _
14 total total ADJ JJ Degree=Pos 15 amod 15:amod _
15 closure closure NOUN NN Number=Sing 12 nmod 12:nmod:of|20:obl _
16 that that DET WDT PronType=Rel 20 obj 15:ref _
17 the the DET DT Definite=Def|PronType=Art 18 det 18:det _
18 Bank bank NOUN NN Number=Sing 20 nsubj 20:nsubj _
19 has have AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 20 aux 20:aux _
20 shown show VERB VBN Tense=Past|VerbForm=Part 15 acl:relcl 15:acl:relcl _
21 to to ADP IN _ 22 case 22:case _
22 us we PRON PRP Case=Acc|Number=Plur|Person=1|PronType=Prs 20 obl 20:obl:to SpaceAfter=No
Yes
https://github.com/UniversalDependencies/UD_English-PUD/pull/20
should the dependencies be nsubj
or are they fine as obj
?
obj is correct: "a producer that she admired" is a way of conveying "she admired the producer", only with "that" standing in for the producer and moved before "she".
Great, thanks. Based on that, I merged the PR as is
The Pronouns dataset doesn't have many errors:
https://github.com/UniversalDependencies/UD_English-Pronouns/pull/8
What about all
labeled as a PDT
? Still the same features?
11 people people NOUN NNS Number=Plur 14 nsubj 14:nsubj _
12 without without ADP IN _ 13 case 13:case _
13 children child NOUN NNS Number=Plur 11 nmod 11:nmod:without _
14 express express VERB VBP Mood=Ind|Tense=Pres|VerbForm=Fin 4 conj 4:conj:and _
15 through through ADP IN _ 17 case 17:case _
16 their they PRON PRP$ Number=Plur|Person=3|Poss=Yes|PronType=Prs 17 nmod:poss 17:nmod:poss _
17 disapproval disapproval NOUN NN Number=Sing 14 obl 14:obl:through _
18 all all DET PDT _ 20 det:predet 20:det:predet _
19 their they PRON PRP$ Number=Plur|Person=3|Poss=Yes|PronType=Prs 20 nmod:poss 20:nmod:poss _
20 hatred hatred NOUN NN Number=Sing 14 obj 14:obj _
21 of of ADP IN _ 23 case 23:case _
22 modern modern ADJ JJ Degree=Pos 23 amod 23:amod _
23 parenting parenting NOUN NN Number=Sing 20 nmod 20:nmod:of SpaceAfter=No
Yeah, PronType=Tot
Pronouns change looks good then?
What about all
in ADV
sentences instead? Any features there? I don't see any on all_ADV
in EWT
1 We we PRON PRP Case=Nom|Number=Plur|Person=1|PronType=Prs 4 nsubj 4:nsubj _
2 're be AUX VBP Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin 4 cop 4:cop _
3 all all ADV RB _ 4 advmod 4:advmod _
4 set set ADJ JJ Degree=Pos 0 root 0:root _
No, if an ADV has features it would just be comparative or superlative I think
that's fair, but i'll just leave it for now
Here's an update for PUD:
https://github.com/UniversalDependencies/UD_English-PUD/pull/21
@amir-zeldes implemented in GUM yet?
I think so - I implemented the table. "Another" now has just PronType=Ind
, that's what we want, right?
Yes, the table at https://universaldependencies.org/en/pos/DET.html.
@AngledLuffa are we done with this issue?
Great, feel free to spot check my work, it's all in the dev branch.
I think we're done - although it occurs to me no one updated LinES. Perhaps I can do that with my script
One thing I found when trying to script the changes to LinES is that they labeled non-English determiners as DET when part of a proper noun. Le Monde
comes up pretty often. Should I treat that as The
or would a different UPOS be more appropriate? Le petit
(no capital, perhaps that is a typo) is the only example I found in EWT of Le
, with a tag of PROPN, and there are none in GUM. It should be pointed out that The
is never a PROPN in EWT. Perhaps Le_DET
is better?
Different treebanks have different policies re: analyzing foreign expressions. Some try to analyze the syntax of the foreign phrase, so DET
and det
. Another option is to treat all the words in the name as PROPN
. Another option is X
.
One thing I found when trying to script the changes to LinES is that they labeled non-English determiners as DET when part of a proper noun.
It depends on whether they decided to annotate foreign phrases following the foreign guidelines, which is legitimate in UD, but optional. But even then foreign multiword names would be gray zone because they can be considered as English phrases but names.
I updated the each
UPOS tags and then made a PR in LinES which updates the features on DET
. I suppose I'll merge it later today if I don't hear otherwise
In comparing EWT and GUM, there are two different standards for the word
another
. In GUM, it has the featurePronType=Art
, whereas in EWT, it has no features. Personally I would think additional features are generally valuable, hence posting it as an issue in EWT.@amir-zeldes
EWT example
GUM example