Open dazza-codes opened 7 years ago
Checksum errors indicate bad data. All writes have checksums. It is trying to read the current checkpoiint record.
You might be able to open the store with the alternate rootblock. See com.bigdata.journal.Options for how to do this.
However, this very likely indicates a physical problem with the disk leading to a bad read back of the written data. This has been the root cause in previous reports of a checksum error. And that is precisely what the checksum is designed to detect.
Thanks, Bryan
On Feb 17, 2017 9:09 PM, "Darren L. Weber, Ph.D." notifications@github.com wrote:
Followed the instructions in the blazegraph-deb section to build and install 2.1.4 and it worked at first. But after trying to load a lot of data the system froze. On restart, blazegraph will not restart using sudo service blazegraph restart (although a following status indicates it is running OK). The log shows the failure to start is a com.bigdata.util.ChecksumError -- is this a bug or a feature?
INFO: com.bigdata.util.config.LogUtil: Configure and watch: /etc/blazegraph/log4j.properties
BlazeGraph(TM) Graph Engine
Flexible
Reliable
Affordable
Web-Scale Computing for the Enterprise
Copyright SYSTAP, LLC DBA Blazegraph 2006-2016. All rights reserved.
sul-dlweber-ubuntu Fri Feb 17 20:59:16 PST 2017 Linux/4.4.0-62-generic amd64 Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz Family 6 Model 58 Stepping 9, GenuineIntel #CPU=2 Oracle Corporation 1.7.0_80 freeMemory=123735856 buildVersion=2.1.4 gitCommit=738d05f08cffd319233a4bfbb0ec2a858e260f9c
Dependency License ICU http://source.icu-project.org/repos/icu/icu/trunk/license.html bigdata-ganglia http://www.apache.org/licenses/LICENSE-2.0.html blueprints-core https://github.com/tinkerpop/blueprints/blob/master/LICENSE.txt colt http://acs.lbl.gov/software/colt/license.html commons-codec http://www.apache.org/licenses/LICENSE-2.0.html commons-fileupload http://www.apache.org/licenses/LICENSE-2.0.html commons-io http://www.apache.org/licenses/LICENSE-2.0.html commons-logging http://www.apache.org/licenses/LICENSE-2.0.html dsiutils http://www.gnu.org/licenses/lgpl-2.1.html fastutil http://www.apache.org/licenses/LICENSE-2.0.html flot http://www.opensource.org/licenses/mit-license.php high-scale-lib http://creativecommons.org/licenses/publicdomain httpclient http://www.apache.org/licenses/LICENSE-2.0.html httpclient-cache http://www.apache.org/licenses/LICENSE-2.0.html httpcore http://www.apache.org/licenses/LICENSE-2.0.html httpmime http://www.apache.org/licenses/LICENSE-2.0.html jackson-core http://www.apache.org/licenses/LICENSE-2.0.html jetty http://www.apache.org/licenses/LICENSE-2.0.html jquery https://github.com/jquery/jquery/blob/master/MIT-LICENSE.txt jsonld https://raw.githubusercontent.com/jsonld-java/jsonld-java/master/LICENCE log4j https://raw.githubusercontent.com/jsonld-java/jsonld-java/master/LICENCElog4j http://www.apache.org/licenses/LICENSE-2.0.html lucene http://www.apache.org/licenses/LICENSE-2.0.html nanohttp http://elonen.iki.fi/code/nanohttpd/#license rexster-core https://github.com/tinkerpop/rexster/blob/master/LICENSE.txt river http://www.apache.org/licenses/LICENSE-2.0.html semargl https://github.com/levkhomich/semargl/blob/master/LICENSE servlet-api http://www.apache.org/licenses/LICENSE-2.0.html sesame http://www.openrdf.org/download.jsp slf4j http://www.slf4j.org/license.html zookeeper http://www.apache.org/licenses/LICENSE-2.0.html
WARN : NanoSparqlServer.java:517: Starting NSS
WARN : WebAppContext.java:506: Failed startup of context
o.e.j.w.WebAppContext@2db238ce{/blazegraph,file:/usr/share/blazegraph-2.1.4/war/,STARTING}{/usr/share/blazegraph/war/}
java.lang.RuntimeException: java.lang.RuntimeException: addr=-374049 :
cause=com.bigdata.util.ChecksumError:
offset=225590272,nbytes=426,expected=0,actual=27656193
at com.bigdata.rdf.sail.webapp.BigdataRDFServletContextListener.openIndexManager(BigdataRDFServletContextListener.java:805)
at com.bigdata.rdf.sail.webapp.BigdataRDFServletContextListener.contextInitialized(BigdataRDFServletContextListener.java:277)
at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:798)
at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:444)
at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:789)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:294)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1341)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1334)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:497)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:163)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
at org.eclipse.jetty.server.Server.start(Server.java:387)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:354)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at com.bigdata.rdf.sail.webapp.NanoSparqlServer.awaitServerStart(NanoSparqlServer.java:518)
at com.bigdata.rdf.sail.webapp.NanoSparqlServer.main(NanoSparqlServer.java:482)
Caused by: java.lang.RuntimeException: addr=-374049 :
cause=com.bigdata.util.ChecksumError:
offset=225590272,nbytes=426,expected=0,actual=27656193
at com.bigdata.rwstore.RWStore.getData(RWStore.java:2097)
at com.bigdata.journal.RWStrategy.readFromLocalStore(RWStrategy.java:732)
at com.bigdata.journal.RWStrategy.read(RWStrategy.java:155)
at com.bigdata.journal.AbstractJournal._getCommitRecord(AbstractJournal.java:4601)
at com.bigdata.journal.AbstractJournal.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/52, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdv4C3P4frA2WuhwRi4qafbRqJuOSW1ks5rdn0igaJpZM4ME_24 .
[Updated ... ]
Thanks, Bryan.
This installation is running on Ubuntu 16.04 linux as a Virtualbox guest on a MacBook Pro (circa 2013).
Apologies for a couple of noob questions:
com.bigdata.journal.Options
?
ack-grep 'journal.Options' /usr/share/blazegraph-2.1.4/
-- nothing.$ cat /etc/default/blazegraph
NAME=blazegraph
BLZG_HOME=/usr/share/${NAME}
BLZG_CONF=/etc/blazegraph
BLZG_LOG=/var/log/${NAME}
BLZG_DATA=/var/lib/${NAME}
JOURNAL_FILE=blazegraph.jnl
JOURNAL="${BLZG_DATA}"/"${JOURNAL_FILE}"
# Run Blazegraph as this user ID and group ID
BLZG_USER=blzg
BLZG_GROUP=blzg
JETTY_XML="${BLZG_CONF}"/jetty.xml
JETTY_RESOURCE_BASE="${BLZG_HOME}"/war/
JETTY_PORT=9999
LOGGING_CONFIG="${BLZG_CONF}"/logging.properties
LOG4J_CONFIG="${BLZG_CONF}"/log4j.properties
NSS="com.bigdata.rdf.sail.webapp.NanoSparqlServer"
NSS_NAMESPACE="kb"
NSS_PROPERTIES="${BLZG_CONF}"/RWStore.properties
JVM_OPTS="-Djava.awt.headless=true -server -Xmx8g -XX:MaxDirectMemorySize=3000m -XX:+UseG1GC"
#Used for testing on EC2 micro instances
#JVM_OPTS="-Djava.awt.headless=true -server -Xmx256m -XX:MaxDirectMemorySize=100m -XX:+UseG1GC"
Thanks, Darren
PS: As a blazegraph noob, I started reading the wiki site, but soon came to the conclusion that a lot of that information is difficult because it is terse or assumes too much background knowledge, or it is out of date; often last updated in 2015.
Darren,
The configuration will be in /etc/blazegraph/. The data by default is in /var/lib/blazegraph/
, though this may be configured by editing the /etc/default/blazegraph file.
See also https://github.com/blazegraph/database/blob/master/blazegraph-deb/README.md.
Can you post some more details on how you were loading the files? You may want to consider trying the REST Bulk Load.
In general, the Wiki is a living document and updated as new features are added, etc.
https://blazegraph.github.io/database/apidocs/com/bigdata/journal/Options.html
On Sat, Feb 18, 2017 at 9:17 AM, Darren L. Weber, Ph.D. < notifications@github.com> wrote:
Thanks, Bryan.
This installation is running on Ubuntu 16.04 linux as a Virtualbox guest on a MacBook Pro (circa 2013).
Apologies for a couple of noob questions:
Where do I find current information about com.bigdata.journal.Options?
- A quick search of this repo using github search didn't pinpoint it.
- ack-grep 'journal.Options' /usr/share/blazegraph-2.1.4/ -- nothing.
- Where is the KB store on the file system (Ubuntu linux) and how is that configured?
Thanks, Darren
PS: As a blazegraph noob, I started reading the wiki site, but soon came to the conclusion that a lot of that information is difficult because it is terse or assumes too much background knowledge, or it is out of date; often last updated in 2015.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/52#issuecomment-280859857, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdv4EchfwxI5d3muFSrYwzSk6i1asUuks5rdyeqgaJpZM4ME_24 .
I used dpkg -L blazegraph
to find more installation details, including the /usr/bin/loadRestAPI.sh
script. I read about that on the wiki and was able to get it working on a previous installation, but could not find it on the debian installation (until just now). (BTW, also just found the example deployment code in src/resources/deployment
and that is interesting, although we want puppet recipes.) When I tried to use a similar REST API script, I did not entirely know where to find the configs and property files but seemed to find relevant files, but it was failing to load data, so I switched to a SPARQL Update approach that seemed to be working OK for a while (it's not entirely a surprise that SPARQL Update may not be an optimal way to load about 32,000 small RDF files without disabling some features).
Look at https://wiki.blazegraph.com/wiki/index.php/REST_API#Bulk_Data_Load for loading many small files (you can point it at a directory).
Bryan
On Sat, Feb 18, 2017 at 9:40 AM, Darren L. Weber, Ph.D. < notifications@github.com> wrote:
I used dpkg -L blazegraph to find more installation details, including the /usr/bin/loadRestAPI.sh script. I read about that on the wiki and was able to get it working on a previous installation, but could not find it on the debian installation (until just now). (BTW, also just found the example deployment code in src/resources/deployment and that is interesting, although we want puppet recipes.) When I tried to use a similar REST API script, I did not entirely know where to find the configs and property files but seemed to find relevant files, but it was failing to load data, so I switched to a SPARQL Update approach that seemed to be working OK for a while (it's not entirely a surprise that SPARQL Update may not be an optimal way to load about 32,000 small RDF files without disabling some features).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/52#issuecomment-280861602, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdv4BIS8BX07c72TSP78yNldl-PLVwiks5rdy0FgaJpZM4ME_24 .
I may now understand these values from the /etc/default/blazegraph
configs:
NAME=blazegraph
BLZG_DATA=/var/lib/${NAME}
JOURNAL_FILE=blazegraph.jnl
JOURNAL="${BLZG_DATA}"/"${JOURNAL_FILE}"
They seem to result in this file:
$ ls -l /var/lib/blazegraph/
total 212M
-rw-r--r-- 1 blzg blzg 295M Feb 17 16:59 blazegraph.jnl
(BTW, coming from prior experience with 4store, I'm surprised that all the KBs are in one big journal file. But I guess that's how it is with Blazegraph. When I create a new namespace, it still goes into this same journal file.) Now I want to experiment with moving that (corrupt?) journal file and restarting blazegraph, in the hope that it will recreate a new one when it starts up. I don't care about trashing any data that I've already loaded, but I expect this drastic move will prompt blazegraph to 'reset' and I'll lose namespaces etc.
In that case, try stopping the service, deleting the blazegraph.jnl, and then restarting. It will create a new journal with your specified options when the service starts.
WOW, so many options in https://blazegraph.github.io/database/apidocs/com/bigdata/journal/Options.html -- a bit over my head when it comes to my knowledge of file system optimizations. Are there any experiences or recommendations for these options when running blazegraph on a linux virtualbox guest on a Mac OSX host? (The VM has 8Gb RAM and might need to bump that up because I see the configs set the java heap at about 8g by default.)
BTW, and this comment is specific to this initial startup problem, the service blazegraph status
was entirely ignorant of the checksum failure - despite the failure to start, that service status reported that blazegraph was up and running.
FYI, the following got the service back up:
sudo -i
ls -lh /var/lib/blazegraph/
service blazegraph status # check it is stopped or use `stop`
mv /var/lib/blazegraph/blazegraph.jnl /var/lib/blazegraph/blazegraph.bak
ls -lh /var/lib/blazegraph/
service blazegraph start
service blazegraph status # it is up, but don't trust this, check the log
ls -lh /var/lib/blazegraph/ # it recreated the journal file
tail -n200 /var/log/blazegraph/blazegraph.log # log looks OK, no check sum errors
Great. You likely want to make sure that you allow plenty of memory for the VM to cache the disk access. If you have 8GB in total, try running with 2G for the JVM heap. It you're going to work any at scale, you'll likely want to increase the RAM to your VM and run the Blazegraph process with 4G or 8G. It's possible that your first load failed due to an OOME as the JVM used all of the VM memory causing the corruption.
I would check your disk for errors. There are probably bad sectors if you got a checksum error. Bryan
On Sat, Feb 18, 2017 at 10:06 AM, Darren L. Weber, Ph.D. < notifications@github.com> wrote:
FYI, the following got the service back up:
sudo -i ls -lh /var/lib/blazegraph/ service blazegraph status # check it is stopped or use
stop
mv /var/lib/blazegraph/blazegraph.jnl /var/lib/blazegraph/blazegraph.bak ls -lh /var/lib/blazegraph/ service blazegraph start service blazegraph status # it is up, but don't trust this, check the log ls -lh /var/lib/blazegraph/ # it recreated the journal file tail -n200 /var/log/blazegraph/blazegraph.log # log looks OK, no check sum errors— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/52#issuecomment-280863707, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdv4DwypBSd7mtyNUW5WKSfMJnl3vz1ks5rdzMpgaJpZM4ME_24 .
Tried to change the disk location for the KB store by setting
# /etc/default/blazegraph
NAME=blazegraph # default
BLZG_DATA=/data/${NAME} # changed only this
JOURNAL_FILE=blazegraph.jnl # default
JOURNAL="${BLZG_DATA}"/"${JOURNAL_FILE}" # default
This process looked like this
sudo -i
service blazegraph stop
# edit /etc/default/blazegraph as above
mkdir /data/blazegraph
chown blzg:blzg /data/blazegraph
service blazegraph start
ls -l /data/blazegraph/ # huh? no journal file, what's up? The logs look OK, geez. Clueless.
Going to try using touch /forcefsck
and reboot the system to find/fix corruption on this virtualbox vm
Got the system running again, so we can close this issue. If you want, you might create a separate issue to fix the service blazegraph status
when the checksum fails to start? (The log only WARNS but this seems like a failure to start the service.)
sudo systemctl enable blazegraph sudo systemctl start blazegraph
Followed the instructions in the blazegraph-deb section to build and install 2.1.4 and it worked at first. But after trying to load a lot of data the system froze. On restart, blazegraph will not restart using
sudo service blazegraph restart
(although a following status indicates it is running OK). The log shows the failure to start is acom.bigdata.util.ChecksumError
-- is this a bug or a feature?