dkpro / dkpro-jwpl

DKPro JWPL (DKPro Java Wikipedia Library) is a free, Java-based application programming interface that facilitates access to all information in Wikipedia.
https://dkpro.github.io/dkpro-jwpl
Apache License 2.0
82 stars 34 forks source link

Page.getPlainText broken - PlainTextConverter struggles to discriminate candidate methods and ends in 'VisitorException' #160

Closed mawiesne closed 6 years ago

mawiesne commented 6 years ago

With the introduction of Swebble 3.1.7 to the JWPL 1.2.0-SNAPSHOT line, I can no longer fetch plain text data from Wikipedia backends via Page.getPlainText. The stacktrace is documented here:

ERROR - de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
    at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
    at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
    at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
    at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
    at ...
    at java.lang.Thread.run(Thread.java:748)
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
    at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:346)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
    ... 79 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
    at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
    at java.util.TimSort.sort(TimSort.java:220)
    at java.util.Arrays.sort(Arrays.java:1512)
    at java.util.ArrayList.sort(ArrayList.java:1462)
    at java.util.Collections.sort(Collections.java:175)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
    ... 90 more

It seems there is a mismatch of method signatures and/or incompatible libraries being used at runtime. I consider this a major bug, as parts of the main functionality are affected. Therefore, this bug should be fixed before releasing JWPL 1.2.0 (Final).

Dependencies involved:

System environment:

Any ideas @ferschke / @reckart ? Can somebody contact the colleagues at FAU Erlangen to investigate this issue?

mawiesne commented 6 years ago

This seems to be a regression introduced with the changes of #152 and #155 .

mawiesne commented 6 years ago

@tgalery as you contributed the changes of #155, can you also have a look into this issue?

rzo1 commented 6 years ago

I can confirm, that this also affects Windows 10 - Stacktrace is similar to the one posted by @mawiesne in a Java 8 environment (Oracle / OpenJDK does not matter)

reckart commented 6 years ago

@mawiesne No idea. I hope @tgalery maybe has some insight.

tgalery commented 6 years ago

can someone post a bit of code that generates the stacktrace above ?

rzo1 commented 6 years ago
import de.tudarmstadt.ukp.wikipedia.api.Page;
import de.tudarmstadt.ukp.wikipedia.api.Wikipedia;
import de.tudarmstadt.ukp.wikipedia.api.exception.WikiApiException;

public class Main {

    public static void main(String[] args) throws  WikiApiException {

        Wikipedia wikipedia = new Wikipedia(new CustomDataSource("host", "dbname", "user", "password", "com.mysql.jdbc.Driver", false));

        //German Wikipedia for example, page with title "Gesundheit"
        Page page = wikipedia.getPage("Gesundheit");

        //Exception will be thrown...
        page.getPlainText();

    }
}

with this implementation as CustomDataSource:

import de.tudarmstadt.ukp.wikipedia.api.DatabaseConfiguration;
import de.tudarmstadt.ukp.wikipedia.api.WikiConstants;
import de.tudarmstadt.ukp.wikipedia.api.WikiConstants.Language;
import org.slf4j.Logger;

import java.sql.*;

public class CustomDataSource extends DatabaseConfiguration {
    private static final Logger logger = org.slf4j.LoggerFactory.getLogger(CustomDataSource.class);

    private String jdbcURL;
    private String databaseDriver;

    /*
     * needed to please frameworks like Spring... parameter injection is done
     * via setters there
     */
    public CustomDataSource() {
        super();
    }

    public CustomDataSource(String hostName, String dbName, String user, String password, String driverClassName, boolean useSSL)  {
        this();
        setDbName(dbName);
        setHostName(hostName);
        setPassword(password);
        setUserName(user);
        // check if the DB driver is available in the classpath
        try {
            Class.forName(driverClassName);
        } catch (ClassNotFoundException e) {
            logger.error(e.getLocalizedMessage(), e);
            throw new RuntimeException(e.getLocalizedMessage(), e);
        }
        String baseJdbcURL = "jdbc:mysql://" + getHostName() + "/" + getDbName();
        if(!hasExternalSSLParams(baseJdbcURL)) {
            if (useSSL) {
                setJdbcURL(baseJdbcURL + "?verifyServerCertificate=false&useSSL=true");
            } else {
                setJdbcURL(baseJdbcURL + "?useSSL=false");
            }
        } else {
            setJdbcURL(baseJdbcURL);
        }

        Language lang = requestWikiLangFromDB(hostName, dbName, user, password);
        setLanguage(lang);
    }

    private boolean hasExternalSSLParams(String baseJdbcURL) {
        return baseJdbcURL.contains("useSSL=");
    }

    /*
     * Although the JWPL-DataBase knows it's Wikipedia language (described as
     * <code>language</code> in the table <code>MetaData</code>), the
     * {@link DatabaseConfiguration} needs to know the specified
     * {@link Language}. Hence, it will be requested by this method so the user
     * does not have to configure the {@link Language} manually.
     *
     * @param hostName
     * @param dbName
     * @param user
     * @param password
     * @return the language found in the <code>MetaData</code>-table, as
     * enumeration instance of {@link Language}
     * @throws WikiServiceException
     */
    private Language requestWikiLangFromDB(String hostName, String dbName, String user, String password)  {

        try (Connection connection = DriverManager.getConnection(getJdbcURL(), user, password)){

            Statement stmnt = connection.createStatement();
            ResultSet result = stmnt.executeQuery("Select language from MetaData");
            if (result.next()) {
                String languageString = result.getString(1);

                logger.info("The language found at {}:{} is '{}' and will be set to this Wiki-DB connection", hostName, dbName, languageString);
                if (languageString.equals("türkçe")) {
                    languageString = "turkish";
                }
                return WikiConstants.Language.valueOf(languageString);
            } else {
                throw new RuntimeException("No language could be found for this Wikipedia DB. This is very strange, check your DB setup!");
            }

        } catch (SQLException e) {
            logger.error(e.getLocalizedMessage());
            throw new RuntimeException(e);
        }
    }

    public void setDbName(String dbName) {
        assert dbName!=null;
        assert dbName.trim().length() > 0;

        super.setDatabase(dbName);
    }

    public String getDbName() {
        return super.getDatabase();
    }

    public void setHostName(String hostName) {
        assert hostName!=null;
        assert hostName.trim().length() > 0;

        super.setHost(hostName);
    }

    public String getHostName() {
        return super.getHost();
    }

    public String getUserName() {
        return super.getUser();
    }

    public void setUserName(String user) {
        assert user!=null;
        assert user.trim().length() > 0;
        super.setUser(user);
    }

    /**
     * @param databaseDriver the databaseDriver to set
     */
    public void setDatabaseDriver(String databaseDriver) {
        assert databaseDriver!=null;
        assert databaseDriver.trim().length() > 0;
        this.databaseDriver = databaseDriver;
    }

    public String getDatabaseDriver() {
        return databaseDriver;
    }

    /**
     * @param jdbcURL the jdbcURL to set
     */
    public void setJdbcURL(String jdbcURL) {
        assert jdbcURL!=null;
        assert jdbcURL.trim().length() > 0;
        this.jdbcURL = jdbcURL;
    }

    public String getJdbcURL() {
        return jdbcURL;
    }

    @Override
    public String getPassword() {
        return super.getPassword();
    }

    @Override
    public void setPassword(String password) {
        super.setPassword(password);
    }

    @Override
    public WikiConstants.Language getLanguage() {
        return super.getLanguage();
    }

    @Override
    public void setLanguage(WikiConstants.Language language) {
        assert language != null;

        super.setLanguage(language);
    }

}

Will output:

Exception in thread "main" de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
    at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
    at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
    at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
    at de.hshn.mi.shc.etl.wiki.Main.main(Main.java:19)
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
    at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
    ... 8 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
    at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
    at java.util.TimSort.sort(TimSort.java:220)
    at java.util.Arrays.sort(Arrays.java:1512)
    at java.util.ArrayList.sort(ArrayList.java:1462)
    at java.util.Collections.sort(Collections.java:175)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
    ... 19 more
tgalery commented 6 years ago

Can you help me understand something in your code ? Looking at the JWPLDataSource would that connect to a database which contains the relevant wikipages ? The creds look funny to me.

rzo1 commented 6 years ago

Basically:

1.) Create a connection to a database. In our case: a MySQL DB containing the Wikipedia Dumps and therefore the wikipedia pages.

2.) I left out the real credentials ;)

3.) Retrieve a page of interest (it does not matter which one).

4.) Try to retrieve the full text via getPlainText()

tgalery commented 6 years ago

gotcha, sorry for being a pain, cause i use this in the context of json wikipedia. Is the Mysql database populated by downloading and importing sql files from here https://dumps.wikimedia.org/enwiki/20180320/ (if so could you let me know which) or is there a transformation from the full xml dump into sql that is done by some cli tool in advance ?

mawiesne commented 6 years ago

We make use of the DataMachine tool, provided by JWPL project, see here: https://dkpro.github.io/dkpro-jwpl/DataMachine/

The resulting files are then imported into a MySQL 5.7 installation.

For a German version of Wikipedia dumps, we basically use:

java -Xmx2g -jar JWPLDataMachine.jar german !Hauptkategorie Begriffsklärung ~/dewiki/$date-of-snapshot$/

as given in the examples section of the how-to.

tgalery commented 6 years ago

Cool, could I get the exact command you guys used to produce the german (or any other language) dump (I will try to replicate the bug and see if there's an easy fix).

rzo1 commented 6 years ago

I updated the code-snippet above to not use internal classes / provided related code to execute it.

mawiesne commented 6 years ago

@tgalery Thanks a ton for looking into this! I will upload a dump of a transformed version of the German wikipedia DB dating Jan 2018. Stay tuned, next comment with instructions will follow shorty.

tgalery commented 6 years ago

@mawiesne that would be extremely helpful

mawiesne commented 6 years ago

@tgalery Download one or both of the two mysql dumps from here:

  1. German version (4.5G): https://download.mi.hs-heilbronn.de/tulum/wikipedia_de_jwpl_Jan2018.sql.gz
  2. Spanish version (2.7G): https://download.mi.hs-heilbronn.de/tulum/wikipedia_es_jwpl_Jan2018.sql.gz

Re-Import them on your local dev-machine via:

  1. In a MySQL shell/tool: CREATE DATABASE wikipedia_de_jwpl_Jan2018 CHARACTER SET UTF8;
  2. In a MySQL shell/tool: GRANT ALL ON wikipedia_de_jwpl_Jan2018.* TO username@'%' IDENTIFIED BY "password";
  3. From a command line/shell: gunzip < wikipedia_de_jwpl_Jan2018.sql.gz | mysql --quick --user=root -p

Same procedure with smaller Spanish (es) version, just exchange 'de' with 'es'. When you decide to use es, you could, for instance, fetch a page such as "Salud".

tgalery commented 6 years ago

Cheers, will give you guys an update as soon as I can.

tgalery commented 6 years ago

Some upates. I've been trying to debug this using the Spanish dump as it's slightly smaller. But it seems I get an exception instantiating the wikipedia class. I'm using scala and I get the following:

scala> import de.tudarmstadt.ukp.wikipedia.api._
import de.tudarmstadt.ukp.wikipedia.api._

scala> val source = new CustomDataSource("host", "dbname", "user", "password", "com.mysql.jdbc.Driver", false)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
source: de.tudarmstadt.ukp.wikipedia.api.CustomDataSource = de.tudarmstadt.ukp.wikipedia.api.CustomDataSource@3ac02398

scala> val wikipedia = new Wikipedia(source)
log4j:WARN No appenders could be found for logger (de.tudarmstadt.ukp.wikipedia.api.Wikipedia).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
org.hibernate.tool.schema.spi.SchemaManagementException: Schema-validation: missing column [version] in table [MetaData]
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.validateTable(AbstractSchemaValidator.java:136)
  at org.hibernate.tool.schema.internal.GroupedSchemaValidatorImpl.validateTables(GroupedSchemaValidatorImpl.java:42)
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.performValidation(AbstractSchemaValidator.java:89)
  at org.hibernate.tool.schema.internal.AbstractSchemaValidator.doValidation(AbstractSchemaValidator.java:68)
  at org.hibernate.tool.schema.spi.SchemaManagementToolCoordinator.performDatabaseAction(SchemaManagementToolCoordinator.java:191)
  at org.hibernate.tool.schema.spi.SchemaManagementToolCoordinator.process(SchemaManagementToolCoordinator.java:72)
  at org.hibernate.internal.SessionFactoryImpl.<init>(SessionFactoryImpl.java:312)
  at org.hibernate.boot.internal.SessionFactoryBuilderImpl.build(SessionFactoryBuilderImpl.java:462)
  at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:710)
  at de.tudarmstadt.ukp.wikipedia.api.hibernate.WikiHibernateUtil.getSessionFactory(WikiHibernateUtil.java:51)
  at de.tudarmstadt.ukp.wikipedia.api.Wikipedia.__getHibernateSession(Wikipedia.java:761)
  at de.tudarmstadt.ukp.wikipedia.api.MetaData.<init>(MetaData.java:44)
  at de.tudarmstadt.ukp.wikipedia.api.Wikipedia.<init>(Wikipedia.java:87)
  ... 42 elided

Is there something wrong with the spanish dump I downloaded above ?

tgalery commented 6 years ago

Commenting out hibernate auto validation gives me this:

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'metadata0_.version' in 'field list'
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
  at com.mysql.jdbc.Util.getInstance(Util.java:386)
  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2625)
  at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2119)
  at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2281)
  at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:60)
  ... 60 more

so ... maybe there's something funny with the dump ?

mawiesne commented 6 years ago

@tgalery I think I know what went wrong, and I'll provide two modified/fresh dumps on next Monday.

UPDATE: Re-Download one of the two files and check sha1sum afterwards:

  1. German version (4.5G): https://download.mi.hs-heilbronn.de/tulum/wikipedia_de_jwpl_Jan2018.sql.gz _sha1sum_should match f837788b0fe5c5b564fd22f11213be9d718190f4

  2. Spanish version (2.7G): https://download.mi.hs-heilbronn.de/tulum/wikipedia_es_jwpl_Jan2018.sql.gz sha1sum should match dc33b2975e4243217e13658685de2bcf3677975a

Remove all previous files / imported DBs and conduct a re-import. It should work now as I've dumped it from one of our production systems in which no DB schema errors are present.

Again, sry for any inconveniences.

rzo1 commented 6 years ago

It seems to be a problem with the reflection code in de.fau.cs.osr.utils.visitor.VisitorLogic, which cannot differentiate between the correct visit methods at runtime.

Line 361ff

    public Object invoke(VisitorInterface<?> visitor, Object node)
            throws IllegalArgumentException,
                IllegalAccessException,
                InvocationTargetException
        {
            touch();
            return method.invoke(visitor, node);
        }

Both classes

Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

extend the same interface classes, which leads to this issue.

rzo1 commented 6 years ago

At Heilbronn University Group we managed to reproduce this bug with the existing test-cases PageTest#testPlainText() and the test-DB provided in #2, see

org.junit.internal.AssumptionViolatedException: got: <de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)
>, expected: null

    at org.junit.Assume.assumeThat(Assume.java:95)
    at org.junit.Assume.assumeNoException(Assume.java:142)
    at de.tudarmstadt.ukp.wikipedia.api.PageTest.testPlainText(PageTest.java:100)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: de.fau.cs.osr.utils.visitor.VisitingException: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorBase.handleVisitingException(VisitorBase.java:92)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:118)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
    at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:346)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
    at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:189)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:28)
    at de.fau.cs.osr.utils.visitor.VisitorBase.go(VisitorBase.java:111)
    at de.tudarmstadt.ukp.wikipedia.api.Page.parsePage(Page.java:610)
    at de.tudarmstadt.ukp.wikipedia.api.Page.getPlainText(Page.java:591)
    at de.tudarmstadt.ukp.wikipedia.api.PageTest.testPlainText(PageTest.java:98)
    ... 23 more
Caused by: de.fau.cs.osr.utils.visitor.VisitorException: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:130)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:90)
    at de.fau.cs.osr.utils.visitor.VisitorBase.resolveAndVisit(VisitorBase.java:119)
    at de.fau.cs.osr.ptk.common.AstVisitor.dispatch(AstVisitor.java:56)
    at de.fau.cs.osr.ptk.common.AstVisitor.iterate(AstVisitor.java:66)
    at de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter.visit(PlainTextConverter.java:210)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$Target.invoke(VisitorLogic.java:361)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:110)
    ... 53 more
Caused by: vClass: de.tudarmstadt.ukp.wikipedia.api.sweble.PlainTextConverter
nClass: org.sweble.wikitext.parser.nodes.WtText
Candidate 1: visit(org.sweble.wikitext.parser.nodes.WtNode)
Candidate 2: visit(de.fau.cs.osr.ptk.common.ast.AstText)

    at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:186)
    at de.fau.cs.osr.utils.visitor.VisitorLogic$1.compare(VisitorLogic.java:168)
    at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
    at java.util.TimSort.sort(TimSort.java:220)
    at java.util.Arrays.sort(Arrays.java:1512)
    at java.util.ArrayList.sort(ArrayList.java:1462)
    at java.util.Collections.sort(Collections.java:175)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.findVisit(VisitorLogic.java:167)
    at de.fau.cs.osr.utils.visitor.VisitorLogic.resolveAndVisit(VisitorLogic.java:105)
    ... 64 more

CI did not complain because of #161

tgalery commented 6 years ago

Cool, I'm assuming this will be reproducible once #162 gets merged ?

rzo1 commented 6 years ago

Yes

rzo1 commented 6 years ago

@tgalery Any updates here? :)

mawiesne commented 6 years ago

@rzo1 @tgalery Seems, I found a fix for this issue locally. I will push a branch and open a PR, once the related test case works as expected.

mawiesne commented 6 years ago

Finally fixed via PR #185