dbpedia-spotlight / dbpedia-spotlight-solr

DBpedia Spotlight SOLR Backend
Apache License 2.0
0 stars 0 forks source link

Loading data into Solr takes Many Hours #1

Closed jithinjustin closed 7 years ago

jithinjustin commented 7 years ago

In the readme its mentioned that loading data into solr takes several minutes. But when I tried loading the index in dropbox for en, it was taking several hours. My machine has 64 GB ram. Index in dropbox is around 73 GB in size. Is there anything I can do (like re-configuring solr) to speed up the indexing process in solr?

sandroacoelho commented 7 years ago

Hi @jithinjustin , Thanks for asking. This repo is an alpha version. I forgot to put it in our README, but that does not stop trying to help you.

I have changed a slight piece of the code to try to speed up our bulk insert. It is already available to use.

How much memory have you started SOLR?

How much memory have you started dbpedia-spotlight-solr?

I recommend start it with -Xms3g

I got 18s per 10.000 documents. My storage is an ordinary magnetic disk with 5600 rpm installed in computer with a real core i7 with 16 GM RAM

English Model has 4.760.017 documents. So it will take 8568 seconds or 143 minutes. You are right: it takes around 2:30 hours. Already fixed in our readme (thanks again).

We will try to speed up as soon after validating the new architecture.

Best,

jithinjustin commented 7 years ago

Thanks @sandroacoelho