Closed sdenning closed 8 years ago
We use the ClearNLP converter, which differs slightly from the Stanford one in some cases. The ClearNLP converter is generally more accurate and practical for our situation (i.e.: we just want to convert treebanks into dependency parses). It increases accuracy by making use of the additional annotations in the treebank. In contrast, the Stanford converter has to support the use-case of converting parser output into dependencies. These parsers don't have the additional annotations, so the Stanford converter uses less information than ClearNLP's.
If the ClearNLP docs really don't describe our dependencies, then okay, we have a problem, and I'll raise it with Jin-ho. But are you sure that's the case?
It may just be that the ClearNLP doc itself needs updating as it is rather old. Appendix B2 lists the Stanford dependencies, which also does not include all of the labels I've observed and differs from the doc I pointed to.
The following dependencies are described by the ClearNLP Doc and listed in Table 2:
ACOMP Adjectival complement ADVCL Adverbial clause modifier ADVMOD Adverbial modifier AGENT Agent NN Noun compound modifier AMOD Adjectival modifier APPOS Appositional modifier ATTR Attribute AUX Auxiliary NUM Numeric modifier AUXPASS Auxiliary (passive) CC Coordinating conjunction CCOMP Clausal complement COMPLM Complementizer CONJ Conjunct CSUBJ Clausal subject CSUBJPASS Clausal subject (passive) DEP Unclassified dependent DET Determiner DOBJ Direct object EXPL Expletive HMOD Modifier in hyphenation HYPH Hyphen INFMOD Infinitival modifier INTJ Interjection IOBJ Indirect object MARK Marker META Meta modifier NEG Negation modifier NMOD Modifier of nominal NPADVMOD Noun phrase as ADVMOD NSUBJ Nominal subject NSUBJPASS Nominal subject (passive) NUMBER Number compound modifier OPRD Object predicate PARATAXIS Parataxis PARTMOD Participial modifier PCOMP Complement of a preposition POBJ Object of a preposition POSS Possession modifier POSSESSIVE Possessive modifier PRECONJ Pre-correlative conjunction PREDET Predeterminer PREP Prepositional modifier PRT Particle PUNCT Punctuation QUANTMOD Quantifier phrase modifier RCMOD Relative clause modifier ROOT Root XCOMP Open clausal complement
Here are the dependency labels generated by SpaCy I've observed while parsing my corpus, * denotes labels not in the ClearNLP doc (these are only what I've observed, there may be more):
Not sure what happened to the formatting on my last post after I submitted it, in the observed labels section each label was on its own line and asterisks are now replaced with bullets. So the following are observed but not documented: acl case compound dative nummod relcl
Hmm, okay. Thanks, I didn't realise those docs were out of date.
Hey @honnibal any chance we could get a full list of all possible dependency labels in SpaCy? Similar to spacy.parts_of_speech.NAMES
?
From symbols.pyx:
"acomp": acomp,
"advcl": advcl,
"advmod": advmod,
"agent": agent,
"amod": amod,
"appos": appos,
"attr": attr,
"aux": aux,
"auxpass": auxpass,
"cc": cc,
"ccomp": ccomp,
"complm": complm,
"conj": conj,
"csubj": csubj,
"csubjpass": csubjpass,
"dep": dep,
"det": det,
"dobj": dobj,
"expl": expl,
"hmod": hmod,
"hyph": hyph,
"infmod": infmod,
"intj": intj,
"iobj": iobj,
"mark": mark,
"meta": meta,
"neg": neg,
"nmod": nmod,
"nn": nn,
"npadvmod": npadvmod,
"nsubj": nsubj,
"nsubjpass": nsubjpass,
"num": num,
"number": number,
"oprd": oprd,
"parataxis": parataxis,
"partmod": partmod,
"pcomp": pcomp,
"pobj": pobj,
"poss": poss,
"possessive": possessive,
"preconj": preconj,
"prep": prep,
"prt": prt,
"punct": punct,
"quantmod": quantmod,
"rcmod": rcmod,
"root": root,
"xcomp": xcomp
I tried that list, but it seems to be incomplete, some missing items include for example compound, nummod and ROOT
On Sep 1, 2016 5:46 PM, "Matthew Honnibal" notifications@github.com wrote:
From symbols.pyx:
"acomp": acomp, "advcl": advcl, "advmod": advmod, "agent": agent, "amod": amod, "appos": appos, "attr": attr, "aux": aux, "auxpass": auxpass, "cc": cc, "ccomp": ccomp, "complm": complm, "conj": conj, "csubj": csubj, "csubjpass": csubjpass, "dep": dep, "det": det, "dobj": dobj, "expl": expl, "hmod": hmod, "hyph": hyph, "infmod": infmod, "intj": intj, "iobj": iobj, "mark": mark, "meta": meta, "neg": neg, "nmod": nmod, "nn": nn, "npadvmod": npadvmod, "nsubj": nsubj, "nsubjpass": nsubjpass, "num": num, "number": number, "oprd": oprd, "parataxis": parataxis, "partmod": partmod, "pcomp": pcomp, "pobj": pobj, "poss": poss, "possessive": possessive, "preconj": preconj, "prep": prep, "prt": prt, "punct": punct, "quantmod": quantmod, "rcmod": rcmod, "root": root, "xcomp": xcomp
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spacy-io/spaCy/issues/233#issuecomment-244122239, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1hdz9Grr_CbfSfiE4AFccLSaE0wOBTks5qlvNlgaJpZM4HI2OX .
Hello @honnibal, I am parsing a German text using your new model and facing the same issue: the dependency tags are not clearly documented. Could you please fix that s.t. we could get the most of your API? :)
UPDATE: I figured, the German model uses its own tags. Specifically, those of the TIGER Treebank as described here http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/tiger_introduction.pdf.
Nevertheless I am looking forward to the description of the English labels:)
Would it be too much work to adapt spaCy to output Universal Dependencies for the English and German parser?
@tanya-h: you can find more info here, but it's in German
Apologies for commenting on a closed issue, but I was scouring github (this issue and #676, #677) trying to figure out what the acl
label is supposed to be, since it's not in the Stanford dependencies manual. After hopping around ClearNLP's (now NLP4J's) docs, I found the following page:
https://emorynlp.github.io/nlp4j/components/dependency-parsing.html
... which describes all of the mystery labels @sdenning helpfully posted above, except nummod
. I post only in case this helps someone in the future.
Hi @honnibal Could you please tell me, how can I get complete list of dependency relations in spacy?
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
The ClearNLP doc pointed to doesn't include quite few of the dependency tags. Here is a Stanford doc that has all of them except DATIVE.
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0ahUKEwjg7pGCgLnKAhVG5mMKHeQwBcEQFggpMAM&url=http%3A%2F%2Fnlp.stanford.edu%2Fsoftware%2Fdependencies_manual.pdf&usg=AFQjCNFvNTtNhYCa9IkZMIaIUvKnzka1nA&sig2=OjqwfibBOlVnr-WpyzSKoQ