fadmaa / grefine-rdf-extension

An extension to Google Refine that enables graphical mapping of Google Refine project data to an RDF skeleton and then exporting it in RDF format
http://refine.deri.ie
Other
94 stars 55 forks source link

Enhance Reconciliation performance ? #50

Open neveldo opened 12 years ago

neveldo commented 12 years ago

Hi,

I am currently using a reconciliation service based on a generic SPARQL endpoint.

My triplestore is on a local server. However, the duration of the reconciliation is very important. It takes about 5 hours to reconcile 90000 lines with 36000 separate entities in my triplestore.

However the triplestore seems not to be overloaded. Is there a way to optimise the reconciliation ?

[edit] I saw in the code that there is a sleep(300) called between each SPARQL request. 300ms*36000 = 3 hours of waiting. Why is there a sleep here ? It seems there is no option to disable this waiting between requests for now.

Maybe such an option would be interresting to improve performance ?

wikier commented 12 years ago

SPARQL has really bad performance about string comparison, so you should use alternative methods

regarding this, Spache Stanbol EntityHub could get a great performance, see this pull request which (hopefully) will introduce support fo rit in refine: https://github.com/fadmaa/grefine-rdf-extension/pull/59