Open inariksit opened 3 years ago
I notice that in the ShallowParse.labels file, there is this line
#disable UseComp MkVPS PositA UseComparA ProgrVP ExtAdvS UttImpSg ImpVP PassVP
But it doesn't seem to do anything—I get stuff like UseComparA even when running the test.conllu
file, resulting in sentences like "the blacker cat", when the original text is "the black cat"
You seem to have found a bug or two. It sounds, as you say, like #disable is not implemented as it should.
@aarneranta #Disable does work, but only in the concrete labels file (as the documentation says). If it's in the abstract labels file like it is for some of the examples in the repo, it is silently ignored.
11d9ef01b1c464917a279b30308fb57881dd5fba fixes this problem, so we can close the issue once it's merged in master.
I'm running ud2gf with ShallowParse, using "the cat sleeps" as my sentence. Here's the original sentence, produced with parsing "the cat sleeps" in UDpipe, and using this code to output the CoNLLU format.
I run ud2gf as follows.
Infinite loop
First, ud2gf ran for 30 minutes until I stopped it.
Uncomment "beam size" of 123 trees
Next, I uncommented this line, to put back the limitation of max 123 candidate trees. This works, in the sense that ud2gf doesn't get stuck in an infinite loop anymore, but the best tree still contains multiple applications of ProgrVP—despite the original sentence having none. Here's the output:
Adding annotations to the conllu file
I have noticed before that I get weird trees if the file is missing morphological annotations. So I added them manually to the CoNLLU file:
With this file, we now get a correct tree with MiniLang:
But with ShallowParse, the tree is as wrong as ever, with multiple ProgrVPs.
So it seems unlikely that the ProgrVP loop is due to user error/insufficiently annotated CoNLLU files.
Workaround
ProgrVP
is the only function in ShallowParse of typea -> a
, so I can just comment it out in the GF grammar. But of course, sometimes such functions are actually needed, so this is not a real solution.