agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

Updating relations gets gradually slower #286

Closed SignIn-SignUp closed 3 years ago

SignIn-SignUp commented 3 years ago

I could reproduce this in Haskell Code as well as in tutd. When updating a relvar like the the one created by the following commands:

And then updating it with that command alternating the value we want to set age to:

The updates get gradually slower after a few updates. In my usecase the same result without the performance decrease can be achieved (at least when using it in code) by deleting the tuple and then inserting the "updated" tuple.

When profiling the majority of time is spent in ProjectM36.Relation.mkRelation.duplicateAttrNames and ProjectM36.Attribute.attributesDifference. But I am guessing that this is normal and maybe unrelated although in some extreme cases the percentage was about 40%. The slowdown feels almost exponential or at least of a high polynomial complexity.

Is this intentional or am I just using something wrong ? Does this issue arise for others as well ?

agentm commented 3 years ago

Indeed! You have discovered a major architectural difference of Project:M36 when compared to other DBMS products. This slowdown is a consequence of Project:M36's novel architecture which maximizes laziness and data independence. You can read more about the laziness in this documentation.

The architecture calls for a caching system based on memoization to mitigate the laziness, but it is not yet implemented. In the meantime, the mental model for this slowdown is that the idempotent database updates "stack" on each other, i.e. all previous updates necessary to generate the answer to the query are re-executed.

SignIn-SignUp commented 3 years ago

Ok, thank you! Thats eye opening. Is this comparable to the logical logging in some other systems wehere the operation is saved but not the corresponding values? Is deleting and inserting a valid alternative at the moment if performance is somewhat needed?

agentm commented 3 years ago

Yes, that's a similar concept except that Project:M36 logs the actual updates and only materializes the necessary tuples (often a subset of tuples needed only to service the query-at-hand) on demand. This allows Project:M36 to advertise commits in constant time O(1) instead of O(n) where n is the number of tuples altered.

As a consequence of this choice, however, all updates must be idempotent (repeatable) and this is strongly enforced which should make replication also much easier to reason about.

SignIn-SignUp commented 3 years ago

Thank you! I will close the issue then.

agentm commented 3 years ago

No problem. The current non-cached version is not ideal, so we hope to improve this in an upcoming release shortly. Feel free to join in IRC at chat.freenode.net #project-m36 as well