clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Enabling link checking causes incomplete import #313

Closed twagoo closed 3 years ago

twagoo commented 3 years ago

image

twagoo commented 3 years ago

Issue was caused by an older version of the database, while the schema has changed since RASA 3.x. An unchecked IllegalArgumentException was thrown from within the logic of the library (see below), which was not handled and probably resulted in the abrupt termination of the import thread each time it occurred. This is prevented by the change in 925b910.

Example error message:

ERROR [ool-1-worker-13] [eu.clarin.cmdi.vlo.importer.CMDIRecordImporter#getLinkStatusForLandingPages:255] - Error while checking resource availability
 for /srv/vlo-data/clarin/results/cmdi/Eurac_Research_CLARIN_Centre/oai_clarin_eurac_edu_20_500_12124_3.xml
java.lang.IllegalArgumentException: Field (category) is not contained in Row ("status"."url", "status"."statusCode", "status"."method", "status"."contentType", "status"."byte
Size", "status"."duration", "status"."timestamp", "status"."redirectCount", "status"."record", "status"."collection", "status"."expectedMimeType", "status"."message")
    at org.jooq.impl.Tools.indexOrFail(Tools.java:1827)
    at org.jooq.impl.AbstractRecord.get(AbstractRecord.java:285)
    at org.jooq.impl.AbstractRecord.getValue(AbstractRecord.java:1262)
    at eu.clarin.cmdi.rasa.DAO.CheckedLink.<init>(CheckedLink.java:80)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)