Open smalyshev opened 6 years ago
@smalyshev I will need more information on this. Also, for both this and other issues, it would be a great benefit if you could provide a reproducer against the underlying blazegraph engine without a reliance on the wikidata components. That would make it much easier for us to spin up a test and diagnose a problem.
I'll add some logs soon (they're big so I'd need to extract and clean them up).
Also, for both this and other issues, it would be a great benefit if you could provide a reproducer against the underlying blazegraph engine without a reliance on the wikidata components.
Probably would not be easy for this one. It is random - I can not reproduce it individually but it happens about once per 100 items updated or so. And it happens with much higher frequency on busy servers (serving query traffic) than on idle ones. But I am not sure how one could isolate it, the set of servers where it happens seems to be random (though it's always the same triples that get dropped from the same item). If you have any ideas please suggest.
The problem itself is not connected to any Wikidata components - Updater just sends SPARQL queries which I will attach, and as you see from above, it's pretty simple triples with nothing special to them - but reproducing them without WDQS would be not easy, since again it's random and only happens occasionally. I could try to run production update traffic against Blazegraph instance without Wikidata modules, but since I can't run query traffic against it, I'm not sure it would even reproduce there. I am not even sure we have hardware now to support extra Blazegraph instance with full update traffic - our cloud hosts are not strong enough to handle this. And doing it outside Wikimedia is hard because our Kafka update streams are not available publicly.
But, if you have any facility where we could set up such test (Updater + non-Wikidata Blazegraph build) we could probably try to set it up. If you have any ideas about what else we could check to see why INSERT is dropping triples, I'd be very grateful.
SPARQL query is available at https://phabricator.wikimedia.org/F27383365
Phabricator does not love me. Can you just paste it in?
I am not aware of any reasons why Blazegraph would be dropping some tuples on insert. If this is a DELETE/INSERT-WHERE style query, then I would check to make sure that the query part of this is identifying the correct solutions (all solutions to be dropped). The same query result is then replayed through the INSERT part of that DELETE/UPDATE-WHERE query. So if there is something wrong with the initial evaluation of the WHERE clause, then it could show up as some odd results.
There were also issues historically where the update code paths were not rechecking for newly defined terms. This could lead to the WHERE clause computing the incorrect solutions. I believe that there is an open ticket related to wikidata for a similar problem, perhaps dealing with a custom type?
If this is a simple INSERT DATA, then, nope, I do not have any ideas at all.
Try this: https://www.dropbox.com/s/5ao9xohudf16g6g/sparql-query.log.gz?dl=0
Pasting it here won't work, it's 3.2 megabyte compressed.
It's INSERT (following DELETE but not the same query), not INSERT DATA, but with empty WHERE: WHERE {}
.
ok. Can try to look tomorrow am. Heading out right now.
Why use INSERT WHERE with an empty WHERE vs INSERT DATA for this? I am pretty sure that there are some parser related optimizations for for INSERT DATA and DELETE DATA (deferred parse to keep the AST for the request down). This might be significant for a request of that size.
On Tue, Dec 4, 2018 at 16:14 Stanislav Malyshev notifications@github.com wrote:
It's INSERT (following DELETE but not the same query), not INSERT DATA, but with empty WHERE: WHERE {}.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/109#issuecomment-444308820, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdv4BAANstoLmItJwO5OT4kPZspyIqYks5u1w_NgaJpZM4ZBmti .
I am not sure why we have used INSERT vs. INSERT DATA. Maybe because this query was refactored repeatedly so it ended up this way, or due to some bug or other issue. I could try using INSERT DATA and see what happens.
I've found a serious issue with our Blazegraph install, where some triples are dropped from updates. For example, INSERT DATA statement contains these triples:
But the database ends up containing only the first one, and the other three are missing. This is happening repeatedly, on many servers and many items - though usually only for a small number of triples per item, and the missing triples are frequently the same on several servers. Which makes me suspect there's a systematic data corruption error somewhere in Blazegraph updating engine. Would appreciate any help figuring out what exactly is going wrong.