Closed Craphtex closed 3 years ago
One place to look is the EXPLAIN of the plans. Blazegraph supports both the explain of a fully executed plan and the live view of the explain of a currently running plan. The wiki has information on how to interpret and use the explain. Among other things, this should help you understand where the memory burden is appearing.
Bryan
On Tue, Mar 9, 2021 at 23:47 Christian Lindgren @.***> wrote:
I got some issues making efficient querying of data containing "filter not exists", and hope someone have an idea of how to make it run without errors. My description and data below contains some blockchain jargon, if that makes it hard to understand I'll translate that all to carrots and potatoes if desired.
Software OS: Ubuntu 18.04.5 LTS BlazeGraph: 2.1.5 (via docker openkbs/blazegraph-docker)
Hardware CPU: AMD Ryzen 7 1800X (8 core @ 3.6GHz) RAM: 4x16GiB (DDR4 @ 2133 MHz) GPU: TITAN Xp (12GiB GDDR5X)
Data I've loaded about 120'000'000 tuples of blockchain data using TTL files in the following format
Blocks
c:B3 a ct:Block ;
ct:number "1"^^xsd:integer ;
ct:epoch "0"^^xsd:integer ;
ct:timestamp "2017-09-23 21:44:51"^^xsd:dateTime .
Transaction
c:T14506 a ct:Transaction ;
ct:fee "0"^^xsd:integer ;
ct:deposit "0"^^xsd:integer ;
ct:block c:B3316 .
Outputs
c:O25816 a ct:Output ;
ct:amount "93148937194773"^^xsd:integer ;
ct:producedBy c:T22726 .
Inputs
c:O1953708 a ct:Output ;
ct:amount "2247872000"^^xsd:integer ;
ct:consumedBy c:T889104 .
Efficient query This is very direct query and resolves very very quickly. It fetches some meta data given an output id (here c:O54321).
SELECT ?amount ?time ?epoch WHERE {
c:O54321 a ct:Output ;
ct:amount ?amount ; ct:producedBy/ct:block ?block .
?block a ct:Block ;
ct:timestamp ?time ; ct:epoch ?epoch .
}
Inefficient query (without filter) This query is doing the most restrictive thing on the last line, and it resolves consistently in about a minute. It fetches all outputs created in a given epoch (here 1).
SELECT ?out ?amount ?time WHERE {
?out a ct:Output ;
ct:amount ?amount ; ct:producedBy/ct:block ?block .
?block a ct:Block ;
ct:timestamp ?time ; ct:epoch 1 .
}
Inefficient query (with filter) This query though contains a "filter not exists" early in the query, and it is too slow to execute. It fetches all outputs with some meta data for all outputs not consumed by a transaction (this is called a UTxO set in blockchain land).
SELECT ?out ?amount ?time ?epoch WHERE {
?out a ct:Output .
FILTER NOT EXISTS { ?out ct:consumedBy ?_ }
?out :amount ?amount ;
ct:producedBy/ct:block ?block .
?block a ct:Block ;
ct:timestamp ?time ; ct:epoch ?epoch .
} LIMIT 5
The issue When I ran the "Inefficient query (with filter)" above I got varying results from time to time. Sometimes it resolved in a minute, other times it resolved after 15, or crashing with a "java.lang.OutOfMemoryError: Java heap space" error (after about 30 minutes).
This I tried to solve by giving BlzaGraph more ram, which I did by changing JVM_OPTS in conf/blacegraph from -Xmx4g to -Xmx32g. Like this:
...
JVM_OPTS="-Djava.awt.headless=true -server -Xmx32g -XX:MaxDirectMemorySize=3000m -XX:+UseG1GC"
...
Now however now I instead get an error (still when running the "Inefficient query (with filter)", the "Efficient query" and "Inefficient query (without filter)" still work fine) stating
ERROR:
(nothing more and nothing less)
I'm a bit stuck here and on the brink of giving up on BlazeGraph in favor of trying something else, so any nudge in the right directing would be appreciated. Anything from a better way of writing the query, to restructuring my data.
Thanks in advance! 🙏
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YBUDP2ABUFDTML4JO3TC4PYFANCNFSM4Y5M6R3Q .
A bit belated thanks for the pointer to the EXPLAIN
feature. After quite a bit of head scratching I've restructured the data which has sped things up.
A word of caution for anyone out there (and especially my future self), a query like
...
FILTER NOT EXISTS { ?out ct:consumedBy ?_ }
...
LIMIT 5
when there are no vertexes matching the NOT EXISTS
constraint will take very long to execute.
I got some issues making efficient querying of data containing "filter not exists", and hope someone have an idea of how to make it run without errors. My description and data below contains some blockchain jargon, if that makes it hard to understand I'll translate that all to carrots and potatoes if desired.
Software OS: Ubuntu 18.04.5 LTS BlazeGraph: 2.1.5 (via docker openkbs/blazegraph-docker)
Hardware CPU: AMD Ryzen 7 1800X (8 core @ 3.6GHz) RAM: 4x16GiB (DDR4 @ 2133 MHz) GPU: TITAN Xp (12GiB GDDR5X)
Data I've loaded about 120'000'000 tuples of blockchain data using TTL files in the following format
Efficient query This is very direct query and resolves very very quickly. It fetches some meta data given an output id (here
c:O54321
).Inefficient query (without filter) This query is doing the most restrictive thing on the last line, and it resolves consistently in about a minute. It fetches all outputs created in a given epoch (here
1
).Inefficient query (with filter) This query though contains a "filter not exists" early in the query, and it is too slow to execute. It fetches all outputs with some meta data for all outputs not consumed by a transaction (this is called a UTxO set in blockchain land).
The issue When I ran the "Inefficient query (with filter)" above I got varying results from time to time. Sometimes it resolved in a minute, other times it resolved after 15, or crashing with a "java.lang.OutOfMemoryError: Java heap space" error (after about 30 minutes).
This I tried to solve by giving BlzaGraph more ram, which I did by changing
JVM_OPTS
inconf/blacegraph
from-Xmx4g
to-Xmx32g
. Like this:Now however now I instead get an error (still when running the "Inefficient query (with filter)", the "Efficient query" and "Inefficient query (without filter)" still work fine) stating
(nothing more and nothing less)
I'm a bit stuck here and on the brink of giving up on BlazeGraph in favor of trying something else, so any nudge in the right directing would be appreciated. Anything from a better way of writing the query, to restructuring my data.
Thanks in advance! :pray: