dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
859 stars 269 forks source link

Dump extraction module fails due to empty ontology.xml #753

Open archiexdex opened 1 year ago

archiexdex commented 1 year ago

Where did the problem occur?

The file ontology.xml downloaded from dbpedia API is empty.

Problem description

When we tried to dump the ontology from dbpedia API (https://mappings.dbpedia.org/api.php), we got an parsing error. The error began from 2023/07. We also found our error was associated with the Github Action fail. link

image

The problem is that the ontology.xml downloaded from dbpedia API is empty. That's why we get the parsing error.

Here is the part of our error message.

INFO: Loading ontology pages
Exception in thread "main" java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.dbpedia.extraction.mappings.CompositeParseExtractor$$anonfun$2.apply(CompositeParseExtractor.scala:73)
>   at org.dbpedia.extraction.mappings.CompositeParseExtractor$$anonfun$2.apply(CompositeParseExtractor.scala:73)
>   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:284)
>   at org.dbpedia.extraction.mappings.CompositeParseExtractor$.load(CompositeParseExtractor.scala:73)
>   at org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump$extract$ConfigLoader$$createExtractionJob(ConfigLoader.scala:124)
>   at org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:40)
>   at org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:40)
>   at scala.collection.TraversableViewLike$Mapped$$anonfun$foreach$2.apply(TraversableViewLike.scala:169)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:743)
>   at scala.collection.immutable.RedBlackTree$TreeIterator.foreach(RedBlackTree.scala:468)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.IterableLike$$anon$1.foreach(IterableLike.scala:310)
>   at scala.collection.TraversableViewLike$Mapped$class.foreach(TraversableViewLike.scala:168)
>   at scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:113)
>   at org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:30)
>   at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
>   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:204)
>   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:178)
>   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:399)
>   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:326)

Expected behaviour

What did you expect?

The script can download the ontology.xml from dbpedia.

Request/Reproduction

Give the link or request, so the problem can be reproduced. Ideally, this would be a unix curl command.

Follow the document . The problem can be reproduced on mvn clean install .