GrammaticalFramework / gf-ud

Functions to analyse and manipulate dependency trees, as well as conversions between GF and dependency trees. The main use case is UD (Universal Dependencies), but the code is designed to be completely generic as for annotation scheme. This repository replaces the old gf-contrib/ud2gf code. It is also meant to be used in the 'vd' command of GF and replace the supporting code in gf-core in the future.
Other
7 stars 15 forks source link

Infinite applications of ProgrVP by ud2gf #12

Open inariksit opened 3 years ago

inariksit commented 3 years ago

I'm running ud2gf with ShallowParse, using "the cat sleeps" as my sentence. Here's the original sentence, produced with parsing "the cat sleeps" in UDpipe, and using this code to output the CoNLLU format.

$ cat /tmp/cat.conllu
1       the     the     DET     _       _       2       det     _       _
2       cat     cat     NOUN    _       _       3       nsubj   _       _
3       sleeps  sleep   VERB    _       _       0       root    _       _

I run ud2gf as follows.

$ cat /tmp/cat.conllu | stack run gf-ud ud2gf grammars/ShallowParse Eng Text at

Infinite loop

First, ud2gf ran for 30 minutes until I stopped it.

Uncomment "beam size" of 123 trees

Next, I uncommented this line, to put back the limitation of max 123 candidate trees. This works, in the sense that ud2gf doesn't get stuck in an infinite loop anymore, but the best tree still contains multiple applications of ProgrVP—despite the original sentence having none. Here's the output:

# bt0, the best (most complete) tree, without backups:
[3] sleeps 3 (2) VERB root (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))) : Imp[3]) 1
    *[1,2] cat 2 (1) NOUN nsubj (UseN cat_N : CN[2]) 1
        *[1] the 1 (2) DET det (the_Det : Det[1]) 1

# at, final GF tree, macros expanded:
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (DetBackup the_Det) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))))

Adding annotations to the conllu file

I have noticed before that I get weird trees if the file is missing morphological annotations. So I added them manually to the CoNLLU file:

$ cat /tmp/cat-annotated.conllu
1   the the DET Det FORM=0  2   det _   _
2   cat cat NOUN    N   Number=Sing 3   nsubj   _   _
3   sleeps  sleep   VERB    V   Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   0   root    _   _

With this file, we now get a correct tree with MiniLang:

# MiniLang with cat.conllu (which is missing annotations)
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (TheBackup the_The) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (UseV sleep_V))

# MiniLang with cat-annotated.conllu
PredVP (DetCN the_Det (UseN cat_N)) (UseV sleep_V)

But with ShallowParse, the tree is as wrong as ever, with multiple ProgrVPs.

# ShallowParse with cat-annotated.conllu
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (DetBackup thePl_Det) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))))

So it seems unlikely that the ProgrVP loop is due to user error/insufficiently annotated CoNLLU files.

Workaround

ProgrVP is the only function in ShallowParse of type a -> a, so I can just comment it out in the GF grammar. But of course, sometimes such functions are actually needed, so this is not a real solution.

inariksit commented 3 years ago

I notice that in the ShallowParse.labels file, there is this line

#disable UseComp MkVPS PositA UseComparA ProgrVP ExtAdvS UttImpSg ImpVP PassVP 

But it doesn't seem to do anything—I get stuff like UseComparA even when running the test.conllu file, resulting in sentences like "the blacker cat", when the original text is "the black cat"

aarneranta commented 3 years ago

You seem to have found a bug or two. It sounds, as you say, like #disable is not implemented as it should.

anka-213 commented 2 years ago

@aarneranta #Disable does work, but only in the concrete labels file (as the documentation says). If it's in the abstract labels file like it is for some of the examples in the repo, it is silently ignored.

inariksit commented 2 years ago

11d9ef01b1c464917a279b30308fb57881dd5fba fixes this problem, so we can close the issue once it's merged in master.