dkpro / dkpro-jwpl

DKPro JWPL (DKPro Java Wikipedia Library) is a free, Java-based application programming interface that facilitates access to all information in Wikipedia.
https://dkpro.github.io/dkpro-jwpl
Apache License 2.0
83 stars 34 forks source link

Fetching plain text throws visitNotFoundException #193

Closed cmheidt closed 6 years ago

cmheidt commented 6 years ago

Hi,

I am trying to add the plain texts to Wikipedia Page objects using the Page.getPlainText method. However fetching these plain texts results in a VisitNotFoundException. The Exception occurs, for example, with the German Wikipedia Articles "Reaktive Sauerstoffspezies" or "Humanbiologie" but does not occur, for example, with "Insulin im Gehirn". Fetching the markup text instead works just fine. I included the stacktrace below, could you look into this bug?

2018-07-18 11:25:24,551 ERROR [main] de.hshn.mi.tulum.neowiki.service.impl.NeoWikiServiceImpl (783): Something went wrong while finding the article text: de.fau.cs.osr.utils.visitor.VisitNotFoundException: Unabl
e to find visit() method for node of type `org.sweble.wikitext.parser.nodes.WtTable' in visitor `de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter'
de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitNotFoundException: Unable to find visit() method for node of type `org.sweble.wikitext.parser.nodes.WtTable' in visitor `de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter'
        at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
        at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
        at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
        at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
        at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
        at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
        at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:195)
        at sun.reflect.GeneratedMethodAccessor109.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
        at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
        at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
        at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
        at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
        at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
        at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
        at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:599)
        at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:580)
        at de.hshn.mi.tulum.graph.wikigraph.service.domain.impl.JWPLPageImpl.getPlainText(JWPLPageImpl.java:87)
        at de.hshn.mi.tulum.neowiki.service.impl.NeoWikiServiceImpl.addChildren(NeoWikiServiceImpl.java:777)
        at de.hshn.mi.tulum.neowiki.service.impl.NeoWikiServiceImpl.generate(NeoWikiServiceImpl.java:593)
        at de.hshn.mi.tulum.neowiki.cli.NeoGenerator.generateNeoGraph(NeoGenerator.java:119)
        at de.hshn.mi.tulum.neowiki.cli.GeneratorCLI.main(GeneratorCLI.java:170)
Caused by: de.fau.cs.osr.utils.visitor.VisitNotFoundException: Unable to find visit() method for node of type `org.sweble.wikitext.parser.nodes.WtTable' in visitor `de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter'
        at de.fau.cs.osr.utils.visitor.VisitorBase.visitNotFound(VisitorBase.java:86)
        at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:108)
        at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
        at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
        at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
        at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
        at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:343)
        at sun.reflect.GeneratedMethodAccessor78.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
        at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
        ... 22 more
mawiesne commented 6 years ago

@VanChriz Will look into this issue. cc/ @reckart

mawiesne commented 6 years ago

@VanChriz Good news: I've reproduced the reported stacktrace in a minimal test setup with a reduced markup text version of https://de.wikipedia.org/wiki/Humanbiologie which contains a table structure in it.

Next, I will now work on a bugfix for parsing such structures correctly.

cmheidt commented 6 years ago

@mawiesne Thanks! I'm sure rz will be appreciative too.