Open ryzhyk opened 5 years ago
I think it is a few things, but not certain:
reduce
. There might be others, but the intent of this particular benchmark was idiom detection / reaction. Not that DD is magical at detection, but what we are seeing here are places where naive Datalog results in less efficient compute than something smarter that is either transformed Datalog, or not exactly Datalog.
Argmax: all of the laterSibling
stuff, which just tries to determine which siblings can be cancelled because they can join with a sibling that is not themselves and is greater.
Blank skipping:
.decl skipBlank(From: id, To: id)
skipBlank(From, To) :- nextElem(From, To).
skipBlank(From, To) :- nextElem(From, Via), !hasValue(Via), skipBlank(Via, To).
.decl nextVisible(Prev: id, Next: id)
nextVisible(Prev, Next) :- hasValue(Prev), skipBlank(Prev, Next), hasValue(Next).
In this case, skipBlank
could be developed starting from hasValue(Prev)
. As written above, if there is a path of blanks of length 1,000, there will be 1,000 choose 2 elements of skipBlank
because any two points determine a blank path. But we only really care about 1,000 of them.
Is your implementation available somewhere?
I can ship it at you. I haven't found a place to home it yet (didn't want to pollute the DD repo), but if we think of a fine place (I should probably put together a DD repo for these solutions, maybe independent of this repo, or maybe as a sub-crate/repo/wtf.
Thanks! Your comment about planning hints is particularly interesting. I always thought about Souffle's ability to take planning hints as a major strength, which DD does not have right now. Isn't it the case that DD can also do quite poorly without user-defined schedules and that it could help to have something similar to Souffle's hints in the future?
My experience so far has been that Souffle can do exceptionally badly in some cases without hints. For example, in the polonius
project (Rust borrow checker) there was a selfjoin roughly like
Reach(A,C) :- Reach(A,B), Reach(B,C)
whose implementation was "on change dReach
to Reach
, first index dReach
and then iterate through Reach
to find matches". This turns what would be "linear in added facts" to something more like "sum across iterations of facts so far". Tweaking the join order encourages it to maintain an index on Reach
instead.
For DDlog implementations of benchmarks, I think we will maintain them as tests in the DDlog repository. As far as I am concerned, there is nothing wrong with keeping DD code in this repo (dynamic-datalog
) or anywhere else as long as it's reproduceable :)
I used our Souffle converter (developed by @mbudiu-vmw) to conver the crdt example to DDlog and manually applied your skipBlank optimization to it. Here is the result I'm getting. Does it appear correct?
result{0,4,"!"}
result{0,4,"?"}
result{4,5,"y"}
result{5,6,"e"}
crdt.rs
implementation provided with https://github.com/frankmcsherry/dynamic-datalog/commit/17a8b1f83451014ff4505d0685ed338808b54147
I had a quick look. This rule
skipBlank(From,To) :- nextElem(From,Via), !hasValue(Via), skipBlank(Via,To).
needs to be rewritten to this one
skipBlank(From,To) :- skipBlank(Via,To), nextElem(From,Via), !hasValue(Via).
if you want performance in Souffle.
Excellent, thanks! I'll try this out and see what the delta is.
To be honest, I'm not 100% sure if the goal of the repo is to max out each implementation, or to bake things off or what. My sort-of intent was to take "Datalog as presented" and map out some numbers, for example "differential with help" and perhaps as well "soufflé with help". But I'll certainly either leave a note or just change the Datalog.
Well - as we all know, current Datalog engines are not declarative when it comes to performance. Without providing appropriate hints, performance suffers severely. Hence, performance comparisons between different engines are very hard because they have different needs.
Souffle is sensitive towards rule ordering and requires these hints if you want performance.
It would be great to come up with an appropriate benchmarking methodology for logic engines. This would be great for our community.
It seems to have improved the times! I got a 10s build still, but a 294.73s second run. This is still a fair bit slower than differential, probably for the other reasons above (one could be fixed with source-to-source rewriting to move the non-blank restriction forward, but the "maximization" one is less clear to me that Datalog can easily express it).
I should stress, and I can change the language on the front page if it is painful, that soufflé is here just to give context for the "dynamic" engines. The original intent was that the other dynamic engine folks would contribute implementations, but other things seem to have come up. So it reads like a soufflé v differential bake off, but that wasn't the actual intent. >.<
I'll check the interpreted version to see if it comes in less than 1000s, and then update the repo with the changed query and updated numbers.
I had only a quick look at which rules can be improved immediately; at the moment we are trying to find appropriate benchmarks. Your benchmark suite is excellent!
We may optimize the codes further and forward you an optimized version in due course.
I made a pull-request substantially improving the runtime. See PR #3.
@frankmcsherry, do you happen to know how come DD is so much faster on this benchmark?