Open johnnymoretti opened 3 years ago
A few thoughts from Michael:
Tough one… best guess: hash code collision issue, where we are doing materialization at a different time in the case of LIMIT (and hash code over materialized vs unmaterialized values somehow differ)?
There is this logic in AST2BOp of using materializing iterator vs materializing explicitly via an operator, which depends on LIMIT and order preservation requirements. I’d guess that this choice causes different plan structures for non-LIMIT vs LIMIT, which triggers the hash problem.
You could likely verify this by verifying the as generated plan variation and then drilling into the actual hash codes being used in the tail segment of the plan for the variant without the LIMIT.
Bryan
On Wed, Nov 17, 2021 at 10:30 AM Giovanni Moretti @.***> wrote:
Hi, I'm using a local instance of Blazegraph 2.1.6 RC and I've noticed that if I submit a query with an aggregate function like group_concat it hangs unless I add a limit instruction.
The following query hangs
SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma
This other one does not hang
SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma limit 1000000000
Obviously my dataset is smaller than 1000000000: it contains 138245 triples.
Any suggestion ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YH4R6NOPMA2E63KAE3UMPYEHANCNFSM5IHWAWYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Thank you for your reply and ok I understand the problem. In order to correct this behavior and clarify if the problem is related to the materialization, there is a simple way to force the materialization ?
Ok it is definitely a materialization problem. I set the materializeProjectionInQuery
variable in the AST2BOpContext
class to false and thus I implicitly forced the projection to be materialized outside the plan: in this way the queries work. However, the issue about materialization remains open. Thanks a lot for your help.
I have another strange thing here. The following query now works with the previously mentioned trick :
PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>
SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs)
where{
?lemma ontolex:writtenRep ?wr
} group by ?lemma
but this other one stucks again:
PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>
SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs)
where{
?lemma ontolex:writtenRep ?wr
} group by ?lemma
order by ?wrs
to fix this problem I have to write the query like this:
PREFIX ontolex: <http://www.w3.org/ns/lemon/ontolex#>
Select * {
SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;
separator=", ") AS ?wrs)
where{
?lemma ontolex:writtenRep ?wr
} group by ?lemma
}
order by ?wrs
I think there is something wrong with the nesting optimization or something related to the order by
function.
Can you please help me ?
Adding Michael. Bryan
On Mon, Nov 22, 2021 at 11:02 AM Giovanni Moretti @.***> wrote:
I have another strange thing here. The following query now works with the previously mentioned trick :
PREFIX ontolex: http://www.w3.org/ns/lemon/ontolex#
SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma
but this other one stucks again:
PREFIX ontolex: http://www.w3.org/ns/lemon/ontolex#
SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ;separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma order by ?wrs
to fix this problem I have to write the query like this:
PREFIX ontolex: http://www.w3.org/ns/lemon/ontolex# Select * { SELECT ?lemma (GROUP_CONCAT(DISTINCT ?wr ; separator=", ") AS ?wrs) where{ ?lemma ontolex:writtenRep ?wr } group by ?lemma } order by ?wrs
I think there is something wrong with the nesting optimization or something related to the order by function.
Can you please help me ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/213#issuecomment-975828876, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YDAVDW4W7XOIUB7VDLUNKHT7ANCNFSM5IHWAWYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Looking at the first question again in more detail, I am attaching the two explains (for a data set with 20k triples, all having distinct subjects and objects but the same prediacte). As suggested initially, there is a difference in the plan: the slow version (explain-wo-limit) contains a dedicated ChunkedMaterializationOp at the end, which clearly dominates runtime.
Looking at the code, I guess the problem happens here: https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata-rdf/src/java/com/bigdata/bop/rdf/join/ChunkedMaterializationOp.java#L375
We add the constructed TermIds (all with termId=0L, indicating that they are mocked) to the idToConstMap. I confirmed that all of the collide in the hash code (which is 0 as well).
Possible ideas for fixes would be
1.) Create a decorator for the idToConstMap (see e.g. https://stackoverflow.com/questions/41461515/hashmap-implementation-that-allows-to-override-hash-and-equals-method) 2.) Wrap the keys into dummy objects that override hashCode(), refining the method to return a hash code different from 0L for TermIds with that have TermId 0L and a value set (i.e., just the hash code of the value) 3.) Leave TermIds with 0L out of the map in the first place and handle them separately (iiuc, the main reason for using the hash map is de-duplication, which may be of limited value for constructed values anyways -- though I would consider this a less preferred option)
It may also be possible to change hashCode() in TermId directly, i.e. if termId==0L compute the hash code based on the cached value always. While this would be the "clean" way to fix the issue, it sounds quite risky (and may cause problems with other code paths -- while I cannot come up with a scenario where this is problematic and I would believe it is the right change conceptually, it's really difficult to assess the blast radius of such an invasive change, both in terms of correctness and performance).
Regarding your proposal:" I set the materializeProjectionInQuery variable in the AST2BOpContext class to false and thus I implicitly forced the projection to be materialized outside the plan". Yes, I think that should be okay as well, but it is more of a workaround (which may negatively impact the performance of other queries).
Regarding your second question, here's the explain for the variant with ORDER BY (executed on some dummy data). As you can see, it contains two ChunkedMaterializationOp's that are expensive, due to the same reasons discussed above (with your change for materializeProjectionInQuery, there would probably be only one):
In the light of this variant, my recommendation would be to fix the root cause (hash collisions) instead of changing materializeProjectionInQuery.
Hi, I'm using a local instance of Blazegraph 2.1.6 RC and I've noticed that if I submit a query with an aggregate function like group_concat it hangs unless I add a limit instruction.
The following query hangs
This other one does not hang
Obviously my dataset is smaller than 1000000000: it contains 138245 triples.
Any suggestion ?