UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
267 stars 245 forks source link

Hierarchical relations #64

Closed spyysalo closed 9 years ago

spyysalo commented 9 years ago

There now appears to be tentative agreement to adopt @yoavg's suggestion for hierarchical relation labels using the general:specific syntax (e.g. npmod:tmod) (with remaining discussion on two-level (option 1a) vs. multi-level (1b)).

The following language-specific types should be renamed accordingly:

Finnish (@fginter / me)

English (@mcdm)

spyysalo commented 9 years ago

@fginter suggests iccomp -> ccomp:nf to parallel nfincl (http://universaldependencies.github.io/docs/u/dep/nfincl.html) (personal communication)

mcdm commented 9 years ago

I thought we were getting rid of nfincl?

spyysalo commented 9 years ago

@mcdm : we are, also from the Universal Dependencies? Sorry, I must have missed that (... so many emails). Shouldn't the documentation be revised to reflect this?

(I think the main point in renaming the TDT iccomp is that the new label should rather invoke "non-finite" rather than "infinite".)

mcdm commented 9 years ago

I am working on the documentation now ;-)

dan-zeman commented 9 years ago

@spyysalo : I believe we are, also from UD. See the thread "Proposed guidelines for clausal dependents" and in particular this mail from @ngiordani:

Hi everyone,

this week at Stanford we've been working on consistency analysis of our annotated EWT, and several issues have come up. One of them is our use of the different types of clausal arguments and adjuncts.

The dependency types used in our annotation were the revised SD types: csubj, csubjpass, ccomp, xcomp, advcl, vmod and rcmod. We found that we hadn't been using these very consistently, and that the types did not have enough coverage. So we had an extended discussion about the differences that we intend to capture in clausal arguments/adjuncts, and we came up with a set of principles that we'd like to propose for UD. This works very well on our data. The principles are:

-- differentiate core arguments from noncore arguments and adjuncts. -- differentiate subjects from complements. -- NOT attempt to differentiate finite from nonfinite clauses. -- differentiate clauses obligatory control from clauses with other types of subject licensing. -- differentiate attachment to predicates from attachment to entities. -- differentiate subjects of passives from other subjects. -- be able to capture clausal modifiers of nouns that do not take the form of a relative clause.

Based on that, we came up with the following dependency types:

-- for clausal subjects, csubj and csubjpass, used exactly as they have always been used in SD. -- for clausal complements, ccomp and xcomp, again used as they have always been in SD. -- for everything else, advcl when the head is a predicate and acl when it's a noun. -- English-specific relation (useful in many but not all languages): we propose acl to have a subtype for relative clauses, relcl.

So basically, we don't see any use for nfincl and ncmod. (!)

This addresses a few problems that we noticed in our annotation. First: even though the subject and complement types did not make a finiteness distinction, we were using finiteness as a criterion between advcl and vmod/nfincl. It wasn't clear to us that this was useful. We were also using vmod/nfincl with predicate heads as well as entity heads. And finally, we found some cases where a clause modified a noun but it couldn't be called a relative clause.

This is outrageously last-minute, but I think it makes things simpler, which might end up saving us some time. We tried hard to come up with clear criteria that are general enough to work across languages (which is another problem with trying to make a finiteness distinction... finiteness isn't the same thing in every language!), but we've only applied this to English data. Here's a few English examples:

For them to take part in the assault would be considered the most mortal of sins. >> csubjpass(considered, take) [Simplified from one of a glorious 4 instances of csubjpass in our corpus. :)] Missing the train ruined my plans. >> csubj(ruined, missing) I said they should buy it. >> ccomp(said, buy) I said to buy it. >> ccomp(said, buy) [No control here, hence ccomp.] He told us to stop. >> xcomp(told, stop) We consider this difficult. >> xcomp(consider, difficult) They left when I got there. >> advcl(left, got) We're working hard to get this off the ground. >> advcl(working, get) the issues as he sees them >> acl(issues, sees) [Note that relcl doesn't work here, and without acl we have no good type for this relation. From the EWT.] the color I prefer >> relcl(color, prefer) the gun with which the girl was shot >> relcl(run, shot)

Thoughts?

N.

spyysalo commented 9 years ago

@dan-zeman : thanks, got it! (too bad we'll now have 40 universal types, not 42 :-P)

spyysalo commented 9 years ago

As of 6ab8452cc7a628e97fdc31bfd5caf7e5c8ac6c01, the Finnish side of this issue is ready for review. (paging @fginter)

spyysalo commented 9 years ago

Just an observation: the Finnish types ccomp:nf and nfincl (at least) appear to violate one of the principles suggested by @ngiordani, namely

-- NOT attempt to differentiate finite from nonfinite clauses.

This divergence should likely be at least documented clearly.

mcdm commented 9 years ago

"nfincl" should disappear from the Finnish relations.

spyysalo commented 9 years ago

@mcdm : ncmod also, right?

mcdm commented 9 years ago

Yes ;-) I have updated the Universal table to reflect the new principles, as well as the English table.

spyysalo commented 9 years ago

BTW, should @ngiordani's list (above) appear in the general UD principles (http://universaldependencies.github.io/docs/structure.html)?

(This is drifting quite a bit from the issue topic ... sorry!)

spyysalo commented 9 years ago

@mcdm : OK, thanks, I'll then keep #66 as-is.

mcdm commented 9 years ago

@spyysalo : Funny you mention this : BTW, should @ngiordani's list (above) appear in the general UD principles (http://universaldependencies.github.io/docs/structure.html)?

I was just looking at that page and see where to add that!

spyysalo commented 9 years ago

@mcdm : re: 49892f8921846c6a997fa594093b3ea37806a0fe: because the ":" character is special in both links and filenames, please use "-" in document names, i.e. instead of

<a href="compound:prt.html">↳prt</a>

link as

<a href="compound-prt.html">↳compound:prt</a>

and name the file compound-prt.md.

spyysalo commented 9 years ago

OK, I think we're about done here. Feel free to reopen if anything remains!