Closed ThomasThelen closed 5 months ago
You don't need to set the entity-index-size
to such a big value. Your guess is correct that this is deserialized as an integer and that's why it is throwing an error. We will look into improving the error handling of this in future versions of GraphDB. For now, you can either set it to the integer max value or don't set it at all.
Probably the most important setting for your repository is the entity-id-size
, which must be set to "40"
for datasets containing more than 2 billion unique RDF values.
Just curious - does this mean that we'll never be able to have a full entity index if there are more than 2 billion unique entities? I'm seeing our Enterprise instance with hundreds of millions of cache misses (and painfully slow queries - can't count the number of triples under 10 minutes), and I'm sort of assuming it's related to the entity-index-size
(hence my large number)
The entity-index-size
will grow as needed. Two reasons come to mind for the cache misses - either there is not enough memory to cache most of the data or the internal structures got fragmented over time.
You can try compacting the indexes as a start. This could also reduce the size of the database on disk, but keep in mind that it will shut down the repository and it will not be usable for the duration of the operation, which for your repository could take a while.
I have a database with around 20 billion statements. When using the import rdf tool (going off issue #46) and setting the entity index size to
20000000000
, I get the error below. I'm guessing this value is a 32 bit int - can probably be fixed by allowing it to be a long17:53:30.220 [main] ERROR com.ontotext.graphdb.GraphDBRepositoryManager - Error while attempting to create repository: entity-index-size has to be a number. org.eclipse.rdf4j.repository.config.RepositoryConfigException: entity-index-size has to be a number. at com.ontotext.graphdb.GraphDBRepositoryManager.validateConfiguration(GraphDBRepositoryManager.java:566) at com.ontotext.graphdb.GraphDBRepositoryManager.addRepositoryConfig(GraphDBRepositoryManager.java:481) at com.ontotext.graphdb.importrdf.BaseLoadTool.createRepositoryInSystemLocation(BaseLoadTool.java:314) at com.ontotext.graphdb.importrdf.BaseLoadTool.mainInternal(BaseLoadTool.java:197) at com.ontotext.graphdb.importrdf.Preload.call(Preload.java:254) at com.ontotext.graphdb.importrdf.Preload.call(Preload.java:55) at picocli.CommandLine.executeUserObject(CommandLine.java:1953) at picocli.CommandLine.access$1300(CommandLine.java:145) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358) at picocli.CommandLine$RunLast.handle(CommandLine.java:2352) at picocli.CommandLine$RunLast.handle(CommandLine.java:2314) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) at picocli.CommandLine$RunLast.execute(CommandLine.java:2316) at picocli.CommandLine.execute(CommandLine.java:2078) at com.ontotext.graphdb.importrdf.ImportRDF.main(ImportRDF.java:31)