UniversalDependencies / tools

Various utilities for processing the data.
GNU General Public License v2.0
203 stars 43 forks source link

validation and fixed: wrongly unallowed deprels #81

Closed Stormur closed 3 years ago

Stormur commented 3 years ago

I am running the validate.py script, and I receive an error for this sentence (which, I am sorry, is very long... but so it goes with Medieval Latin):

# sent_id = 170
# text = Quare , cribellum cupientes deponere , ut residentiam cito visamus , dicimus Tridentum atque Taurinum nec non Alexandriam civitates metis Ytalie in tantum sedere propinquas quod puras nequeunt habere loquelas ; ita quod si etiam quod turpissimum habent vulgare , haberent pulcerrimum , propter aliorum commixtionem esse vere latium negaremus .
1   Quare   quare   ADV r   _   12  discourse   _   _
2   ,   ,   PUNCT   Pu  _   4   punct   _   _
3   cribellum   cribellum   NOUN    sns2a   Case=Acc|Gender=Neut|NounClass=IndEurO|Number=Sing  5   obj _   _
4   cupientes   cupio   VERB    va3pppmn    Aspect=Imp|Case=Nom|Degree=Pos|Gender=Masc|NounClass=IndEurI|Number=Plur|Tense=Pres|VerbClass=LatX2|VerbForm=Part|Voice=Act 12  advcl:pred  _   _
5   deponere    depono  VERB    va3fp   Aspect=Imp|Tense=Pres|VerbClass=LatX|VerbForm=Inf|Voice=Act 4   ccomp   _   _
6   ,   ,   PUNCT   Pu  _   10  punct   _   _
7   ut  ut  SCONJ   cs  ConjType=Cmpr   10  mark    _   _
8   residentiam residentia  NOUN    sfs1a   Case=Acc|Gender=Fem|NounClass=IndEurA|Number=Sing   10  obj _   _
9   cito    cito    ADV r   _   10  advmod  _   _
10  visamus viso    VERB    va3cpp1 Aspect=Imp|Mood=Sub|Number=Plur|Person=1|Tense=Pres|VerbClass=LatX|VerbForm=Fin|Voice=Act   4   advcl   _   _
11  ,   ,   PUNCT   Pu  _   12  punct   _   _
12  dicimus dico    VERB    va3ipp1 Aspect=Imp|Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbClass=LatX|VerbForm=Fin|Voice=Act   0   root    _   _
13  Tridentum   tridentum   PROPN   Sns2a   Case=Acc|Gender=Neut|NounClass=IndEurO|Number=Sing  24  nsubj   _   _
14  atque   atque   CCONJ   co  _   15  cc  _   _
15  Taurinum    taurinum    PROPN   Sns2a   Case=Acc|Gender=Neut|NounClass=IndEurO|Number=Sing  13  conj    _   _
16  nec nec CCONJ   co  Polarity=Neg    18  cc  _   _
17  non non PART    r   Polarity=Neg    16  fixed   _   _
18  Alexandriam alexandria  PROPN   Sfs1a   Case=Acc|Gender=Fem|NounClass=IndEurA|Number=Sing   13  conj    _   _
19  civitates   civitas NOUN    sfp3a   Case=Acc|Gender=Fem|NounClass=IndEurX|Number=Plur   13  flat    _   _
20  metis   meta    NOUN    sfp1d   Case=Dat|Gender=Fem|NounClass=IndEurA|Number=Plur   25  obl:arg _   _
21  Ytalie  italia  PROPN   Sfs1g   Case=Gen|Gender=Fem|NounClass=IndEurA|Number=Sing   20  nmod    _   _
22  in  in  ADP e   AdpType=Prep    25  advmod  _   _
23  tantum  tantum  ADV r   _   22  fixed   _   _
24  sedere  sedeo   VERB    va2fp   Aspect=Imp|Tense=Pres|VerbClass=LatE|VerbForm=Inf|Voice=Act 12  ccomp   _   _
25  propinquas  propinquus  ADJ afp1a   Case=Acc|Degree=Pos|Gender=Fem|NounClass=IndEurA|Number=Plur    24  advcl:pred  _   _
26  quod    quod    SCONJ   cs  _   28  mark    _   _
27  puras   purus   ADJ afp1a   Case=Acc|Degree=Pos|Gender=Fem|NounClass=IndEurA|Number=Plur    30  amod    _   _
28  nequeunt    nequeo  VERB    va5ipp3 Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbClass=LatAnom|VerbForm=Fin|Voice=Act    24  advcl   _   _
29  habere  habeo   VERB    va2fp   Aspect=Imp|Tense=Pres|VerbClass=LatE|VerbForm=Inf|Voice=Act 28  xcomp   _   _
30  loquelas    loquela NOUN    sfp1a   Case=Acc|Gender=Fem|NounClass=IndEurA|Number=Plur   29  obj _   _
31  ;   ;   PUNCT   Pu  _   32  punct   _   _
32  ita ita ADV r   _   24  parataxis   _   _
33  quod    quod    SCONJ   cs  _   50  mark    _   _
34  si  si  SCONJ   cs  _   41  mark    _   _
35  etiam   etiam   ADV co  _   41  advmod  _   _
36  quod    qui PRON    presna  Case=Acc|Gender=Neut|Number=Sing|PronType=Rel   38  obj _   _
37  turpissimum turpis  ADJ ans2as  Case=Acc|Degree=Abs|Gender=Neut|NounClass=IndEurO|Number=Sing   36  amod    _   _
38  habent  habeo   VERB    va2ipp3 Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbClass=LatE|VerbForm=Fin|Voice=Act   39  acl:relcl   _   _
39  vulgare vulgare NOUN    sns3a   Case=Acc|Gender=Neut|NounClass=IndEurX|Number=Sing  41  obj _   _
40  ,   ,   PUNCT   Pu  _   39  punct   _   _
41  haberent    habeo   VERB    va2cip3 Aspect=Imp|Mood=Sub|Number=Plur|Person=3|Tense=Past|VerbClass=LatE|VerbForm=Fin|Voice=Act   50  advcl   _   _
42  pulcerrimum pulcher ADJ ans1as  Case=Acc|Degree=Abs|Gender=Neut|NounClass=IndEurO|Number=Sing   39  amod    _   _
43  ,   ,   PUNCT   Pu  _   50  punct   _   _
44  propter propter ADP e   AdpType=Prep    46  case    _   _
45  aliorum alius   DET dpnpg   Case=Gen|Gender=Neut|NounClass=LatPron|Number=Plur|PronType=Ind 46  nmod    _   _
46  commixtionem    commixtio   NOUN    sfs3a   Case=Acc|Gender=Fem|NounClass=IndEurX|Number=Sing   50  obl _   _
47  esse    sum AUX va5fp   Aspect=Imp|Tense=Pres|VerbClass=LatAnom|VerbForm=Inf|Voice=Act  49  cop _   _
48  vere    vere    ADV r   Degree=Pos  49  advmod  _   _
49  latium  latius  ADJ ans1a   Case=Acc|Degree=Pos|Gender=Neut|NounClass=IndEurO|Number=Sing   50  ccomp   _   _
50  negaremus   nego    VERB    va1cip1 Aspect=Imp|Mood=Sub|Number=Plur|Person=1|Tense=Past|VerbClass=LatA|VerbForm=Fin|Voice=Act   32  orphan  _   _
51  .   .   PUNCT   Pu  _   12  punct   _   _

The problem is at token 22: we have a fixed adverbial expression, in tantum 'so much (so)', meaning that tantum depends as fixed from in, and then in bears the advmod relation with respect to propinquas (25). The validator complains:

'advmod' should be 'ADV' but it is 'ADP'

Should it not be so that fixed relations do not give an error for alleged POS/deprel mismatches of fixed heads? Because, since individual elements of fixed expressions retain their morphological analysis, but the whole expression can act differently than the single parts, there can be no restrictions.

I think I have other errors like that, but I still have to check.

dan-zeman commented 3 years ago

Hmm, this should not be happening... The validator has the condition that fixed should not be among child relations. Will investigate.

Stormur commented 3 years ago

Update: I tried again, and this time the error seems to be absent. I will keep an eye for this... it is not clear if it was something else that triggered it.

dan-zeman commented 3 years ago

I could not replicate the error (with a fresh update of tools, the above sentence copied from here and saved as a CoNLL-U file). Only dozens of undocumented/disallowed features.

Stormur commented 3 years ago

The error seems indeed to be gone. Sorry for having bothered!