blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
891 stars 172 forks source link

Blazegraph Index Status #85

Closed loretoparisi closed 6 years ago

loretoparisi commented 6 years ago

I was able to run my BlazeGraph server via docker, so I can see my instance running on http://192.168.20.113:9999/bigdata/

I have A Journal file mounted in the shared volume, that is 282GB at the time the server is running:

$ ls -lh wikidata/wikidata.jnl 
-rw-r--r-- 1 root root 282G Apr  1 17:11 wikidata/wikidata.jnl

The problem now is that I do not get any data back from the examples queries like

SELECT ?item ?itemLabel 
WHERE 
{
  ?item wdt:P31 wd:Q146.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

while I should have the same output as wikidata query engine on the latest dump here

I have successfully download the latest export and split to the ttl files, using nohup ./munge.sh -f $ROOT/data/$WIKIDATA_DUMP_FILE -d $ROOT/data/split -l en &, hence in the shared volume split folder I can see a list of ttl.gz file

-rw-r--r-- 1 root root 186839974 Mar 27 09:48 wikidump-000000001.ttl.gz
-rw-r--r-- 1 root root 114197889 Mar 27 09:51 wikidump-000000002.ttl.gz
...
-rw-r--r-- 1 root root  56560072 Mar 27 14:24 wikidump-000000165.ttl.gz
-rw-r--r-- 1 root root  51883556 Mar 27 14:25 wikidump-000000166.ttl.gz.fail.fail
-rw-r--r-- 1 root root  53976584 Mar 27 14:27 wikidump-000000167.ttl.gz
...
-rw-r--r-- 1 root root  67287903 Mar 27 22:56 wikidump-000000475.ttl.gz
-rw-r--r-- 1 root root  12876417 Mar 27 22:56 wikidump-000000476.ttl.gz

where I see one failed split -rw-r--r-- 1 root root 51883556 Mar 27 14:25 wikidump-000000166.ttl.gz.fail.fail. I have then launched the import with nohup ./loadRestAPI.sh -n wdq -d $ROOT/data/split &

From the GUI, the current status says

Build Version=2.1.5-SNAPSHOT
Build Git Commit=8ff64aab6071b5e591e80676809e43f0ddb8ad49

Build Git Branch=refs/heads/2.1.5RC-wmf.1

Accepted query count=8

Running query count=0

Show queries, query details.

/GeoSpatial/bigMinCalculationTimeMS=0
/GeoSpatial/filterCalculationTimeMS=0
/GeoSpatial/geoSpatialSearchRequests=0
/GeoSpatial/geoSpatialServiceCallSubRangeTasks=0
/GeoSpatial/geoSpatialServiceCallTasks=0
/GeoSpatial/rangeCheckCalculationTimeMS=0
/GeoSpatial/zOrderIndexHitRatio=null
/GeoSpatial/zOrderIndexHits=0
/GeoSpatial/zOrderIndexMisses=0
/GeoSpatial/zOrderIndexScannedValues=0
/blockedWorkQueueCount=0
/blockedWorkQueueRunningTotal=0
/bufferedChunkMessageBytesOnNativeHeap=-1040
/bufferedChunkMessageCount=-27
/deadlineQueueSize=0
/operatorActiveCount=0
/operatorHaltCount=27
/operatorStartCount=27
/operatorTasksPerQuery=3.375
/queriesPerSecond=46.24277456647399
/queryDoneCount=8
/queryErrorCount=0
/queryStartCount=8

and I can see several values in the Performances tab, that I have attached in this gist.

How can I figure out which is the problem of the current status i.e. that all my data has been correctly imported?

loretoparisi commented 6 years ago

[UPDATE] This is the status of the properties of the namespace wdq


com.bigdata.namespace.wdq.spo.com.bigdata.btree.BTree.branchingFactor | 1024
-- | --
com.bigdata.relation.container | wdq
com.bigdata.namespace.wdq.spo.OSP.com.bigdata.btree.BTree.branchingFactor | 64
com.bigdata.rwstore.RWStore.smallSlotType | 1024
com.bigdata.journal.AbstractJournal.bufferMode | DiskRW
com.bigdata.journal.AbstractJournal.file | /root/data/wikidata.jnl
com.bigdata.namespace.wdq.spo.SPO.com.bigdata.btree.BTree.branchingFactor | 600
com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass | org.wikidata.query.rdf.blazegraph.WikibaseVocabulary$V003
com.bigdata.rdf.store.AbstractTripleStore.textIndex | false
com.bigdata.rdf.store.AbstractTripleStore.geoSpatialDatatypeConfig.0 | {"config": {"uri":"http://www.opengis.net/ont/geosparql#wktLiteral","literalSerializer":"org.wikidata.query.rdf.blazegraph.inline.literal.WKTSerializer","fields":[{"valueType":"DOUBLE","multiplier":"1000000000","serviceMapping":"LONGITUDE"},{"valueType":"DOUBLE","multiplier":"1000000000","serviceMapping":"LATITUDE"},{"valueType":"LONG","multiplier":"1","minValue":"0","serviceMapping":"COORD_SYSTEM"}]}}
com.bigdata.journal.AbstractJournal.initialExtent | 209715200
com.bigdata.rdf.store.AbstractTripleStore.geoSpatialIncludeBuiltinDatatypes | false
com.bigdata.btree.BTree.branchingFactor | 128
com.bigdata.namespace.wdq.lex.com.bigdata.btree.BTree.branchingFactor | 400
com.bigdata.rdf.store.AbstractTripleStore.extensionFactoryClass | org.wikidata.query.rdf.blazegraph.WikibaseExtensionFactory
com.bigdata.rdf.store.AbstractTripleStore.axiomsClass | com.bigdata.rdf.axioms.NoAxioms
com.bigdata.service.AbstractTransactionService.minReleaseAge | 1
com.bigdata.rdf.sail.bufferCapacity | 100000
com.bigdata.rdf.sail.truthMaintenance | false
com.bigdata.namespace.wdq.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor | 800
com.bigdata.journal.AbstractJournal.maximumExtent | 209715200
com.bigdata.rdf.store.AbstractTripleStore.geoSpatialDefaultDatatype | http://www.opengis.net/ont/geosparql#wktLiteral
com.bigdata.rdf.sail.namespace | wdq
com.bigdata.relation.class | com.bigdata.rdf.store.LocalTripleStore
com.bigdata.rdf.store.AbstractTripleStore.quads | false
com.bigdata.journal.AbstractJournal.writeCacheBufferCount | 1000
com.bigdata.relation.namespace | wdq
com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory | org.wikidata.query.rdf.blazegraph.WikibaseInlineUriFactory
com.bigdata.btree.writeRetentionQueue.capacity | 4000
com.bigdata.journal.AbstractJournal.historicalIndexCacheTimeout | 5
com.bigdata.journal.AbstractJournal.historicalIndexCacheCapacity | 20
com.bigdata.rdf.store.AbstractTripleStore.geoSpatial | true
com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers | false
com.bigdata.namespace.wdq.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor | 128