ceurws / lod

Anything we need to maintain the Linked Open Data (LOD) publication of CEUR-WS.org
16 stars 2 forks source link

Extract LOD from all proceedings volumes #2

Closed clange closed 9 years ago

clange commented 9 years ago

… using https://github.com/ailabitmo/sempubchallenge2014-task1.

Expected output: one large rdfdb.ttl file, which will require further processing.

S6savahd commented 9 years ago

I run the parser over all urls, there were some errors. I decided to run it in several point of times using parts of urls. Now it is running over 1000 for 2 hours and still not done.

clange commented 9 years ago

OK, if there were errors, please take notes of them, copy the respective output, and try to identify the piece of input (e.g. the HTML or PDF file) on which the errors occurred, and report them as issues in https://github.com/ailabitmo/sempubchallenge2014-task1/issues. (Or first check whether such an issue has already been reported there, as they have also tried to produce a complete dump already.)

S6savahd commented 9 years ago

the error was not important, just one of the links had strange characters, I removed it.

S6savahd commented 9 years ago

All data is extracted from all URLs except the removed one! The only thing is that, the data should be extracted once more with new DBpedia dumps but we can use the current extracted data!