UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
197 stars 41 forks source link

confusion in UD 2.12 and UD 2.10 (Which one should be followed by non-experts in Grammar? ) #507

Closed Shasetty closed 4 months ago

Shasetty commented 5 months ago

There are lot of structural changes on the parent & child relationship between version 2.10 & 2.12.

1)Which version is the correct one? 2)Which version should be followed by non-experts in Grammar?

Text: The Unregistered Warrants and the Unregistered Warrant Shares are not being registered under the Securities Act of 1933, as amended (the “Securities Act”), pursuant to the registration statement of which this prospectus supplement and the accompanying base prospectus form a part and are not being offered pursuant to this prospectus supplement and the accompanying base prospectus.

acl(Act-16,amended-21) : version ud 2.10-220711

advcl(registered-12,amended-21) : version 2.12.230717

nschneid commented 5 months ago

Thank you for the question. Which corpus is this from?

As a general matter, no dataset is perfect. Each version tries to improve the datasets so they will be more correct and coherent as grammatical descriptions. So in general we would recommend using the most recent version, though of course it is possible that some new errors have been introduced.

Shasetty commented 5 months ago

english ewt ud 2.12-230717 english ewt ud 2.10-220711

nschneid commented 5 months ago

That sentence doesn't appear in the English-EWT corpus. Are you referring to the output of a parser?

Shasetty commented 5 months ago

referring to the output of a parser.

nschneid commented 5 months ago

I see—parsers do make errors, and different parsers behave differently on the same input. So my advice is to use the most recent parser version if you are not sure, but there is no guarantee that it will always be correct.

Shasetty commented 5 months ago

acl(Act-16,amended-21) : version ud 2.10-220711

advcl(registered-12,amended-21) : version 2.12.230717


it's not a convincing answer.

Acl modifies noun, but advcl modifies verb, there is a day night difference.

nschneid commented 5 months ago

Natural language is full of ambiguities that make parsing difficult (especially in long sentences). Sometimes there are multiple interpretations that are both plausible.

I think I agree with you that acl is the more plausible interpretation in this particular sentence. But you'd have to look at a pattern of behavior on many sentences to decide whether one version of the parser is better for your dataset in general.

This repo is for the dataset the parser was trained on, not the parser itself, so I can't explain why the parser gives the output it does.

Shasetty commented 5 months ago

Being a non-expert in Grammar, stuck in a confused status.

AngledLuffa commented 5 months ago

I think it would help if you said which parser you're looking at

On Mon, Jan 22, 2024, 8:17 AM Shasetty @.***> wrote:

Being a non-expert in Grammar, stuck in a confused status.

— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_English-EWT/issues/507#issuecomment-1904343457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWI7MSW6OKRSGCXCENTYP2GIDAVCNFSM6AAAAABCFKNUWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBUGM2DGNBVG4 . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>