frankmcsherry / dynamic-datalog

Engines, queries, and data for dynamic Datalog computation
122 stars 8 forks source link

Optimise crdt & galan #3

Open b-scholz opened 4 years ago

b-scholz commented 4 years ago

Hi Frank,

I have optimized crdt for Souffle. On my computer, the benchmark is now less than a second (0.5s) from more than 400s. There is still more potential for optimizations, and we can further reduce the runtime by looking at record processing, etc. Let me know whether you are interested in this.

The main reasons for the slowdown were: 1) scheduling problems (putting the recursive relation at the front of the body) 2) the transitive closure skipBlank was not specialized 3) record processing required an index via the translate table

It would be nice that Souffle does these transformations fully automatically, and some we can do in the near future. I would attribute 1) and 2) to implementation details and they are not so interesting for a wider audience.

However, the interesting point is that differential dataflow has a notion of an inbuilt magic set / subsumptive tabling strategy (w.r.t 2). Is this correct or have you specialized the code by hand in the rust implementation accordingly?

Cheers, Bernhard.

b-scholz commented 4 years ago

I also have optimized Galan by using better schedules. With more time, better schedules could be found. On my computer better schedules halve the runtime.

frankmcsherry commented 4 years ago

Thanks Bernhard. I'm going to have to think about what to do with these. One of the goals was to take the queries as presented, but the other goal was also to have some other incremental Datalog engines show up and that didn't work out either. :)

If it's alright, I'd rather drop them next to the presented queries as "souffle optimized" or something like this. That would also let folks see the structure of the improvements.

Other folks who might be interested in the improvement in absolute numbers: @rntz @ept @madgen