How to configure and run the parser

Required modules:

The following Python modules need to installed:

All configuration settings should be in config.py file which should be created from config.py.example by renaming it.

The list of input urls are set as a Python list to input_urls variable.

Parser uses DBpedia to extract the names of countries and univeristies, and their URIs in DBpedia.

There are three options:

to use the original dataset. It's by default, nothing should be configured,
to use the OpenLink's mirror, then the sparqlstore['dbpedia_url'] should be changed to http://lod.openlinksw.com/sparql,
to use a local dump, it's prefered option, because it should be much faster and more stable. The sparqlstore['dbpedia_url'] should be set to the local SPARQL Endpoint and the RDF files dumps/dbpedia_country.xml and dumps/dbpedia_universities.xml should be uploaded to it. Look at the wiki to find the steps to generate the DBpedia dumps.

Once you finished with the configuration you need just to execute the following script:

python CeurWsParser/spider.py

The dataset will be in rdfdb.ttl file.

SPARQL queries created for the Task 1 as translation of the human readable queries to SPARQL queries using our data model. The queries are in the wiki.

Maxim Kolchin (kolchinmax@gmail.com)

Fedor Kozlov (kozlovfedor@gmail.com)