agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

Incremental Relational Lenses #223

Open jmatsushita opened 5 years ago

jmatsushita commented 5 years ago

Hi there,

Really awesome project! Looking forward to build something awesome on top of it.

I came across this talk at ICFP https://www.youtube.com/watch?v=95iEqOzjAXw which is based on this paper https://arxiv.org/pdf/1807.01948.pdf and I'm wondering if you're aware of it, and if you might add this feature to your roadmap? Sounds like if Project:M36 clients where able to leverage this technique, it would be really powerful.

Oh and by the way, where does the name Project:M36 come from?

Cheers,

Jun

agentm commented 5 years ago

Hi @jmatsushita and welcome to the project! Feel free to join us in IRC as well: chat.freenode.net #project-m36.

That is indeed an interesting paper, but the proposal is highly reliant on how existing SQL databases are designed (basically, they are not strictly relational, unlike Project:M36). For example, Chris Date has an entire book covering the "view update problem" whereby most views are not consistent in the sense that they cannot be treated as "normal" relations. Thus, the lenses proposed in the paper would not be relational, preventing any higher-level composition.

It certainly makes to compose relational updates, however, which is why we designed "isomorphic schemas" which builds upon research into views but enforcing only relational transformations which are isomorphic (can be inverted). In this fashion, we support storing deltas on entire schemas (not just DDL) which can be computed on-the-fly. Compared to SQL views, isomorphic views may seem limiting, so further research is needed in order to handle use cases such as security views.

One of the problems the paper seems to address is composition of relational operators. In the end, though, it still generates non-composing SQL. Project:M36 takes a different stance and uses composable DSLs for all operations.

I hope that you have also seen project-m36-typed.

"Project:M36" is simply a semi-unique name I made up. It has no specific meaning.

jmatsushita commented 5 years ago

Thanks for the welcome 😄

I did read the docs on isomorphic schemas just yesterday! It does seem like a promising area of work. Maybe I'll read the book at some point 😓I have a few clarifying questions about the example in the meantime:

What I found most compelling about the paper (and I'm fairly new to all this so I might get this wrong) is the possibility for remote clients (and their clients) to hold a subset of the data as a view (some kind of zoomed-in smaller state, into the larger state of the whole database, including composition of these zooms), and be able to express modifications as deltas, which can then be transferred to the server efficiently and applied to the original data set. I assume that isomorphic schemas are sufficient, but are they necessary to get these properties?

Maybe @rudihorn will be interested in swinging by?

I have seen project-m36-typed and a servant like interface to the Project:M36 seems really awesome. I'll go give it a try and post some issues/questions there too :)

No specific meaning? Can't help but hear this as an invitation 😈 https://en.wikipedia.org/wiki/M36 I like the hieroglyph for bundle of flax : "to tie together, to bind, to gather together, to collect". https://en.wikipedia.org/wiki/List_of_Egyptian_hieroglyphs#M

jamescheney commented 5 years ago

Hi @jmatsushita and @agentm,

@rudihorn mentioned this discussion to me. (I'm one of the other authors of the paper Jun mentioned). I hope you don't mind my jumping in with some (possibly too wordy) comments.

@agentm writes:

but the proposal is highly reliant on how existing SQL databases are designed (basically, they are not strictly relational, unlike Project:M36). For example, Chris Date has an entire book covering the "view update problem" whereby most views are not consistent in the sense that they cannot be treated as "normal" relations. Thus, the lenses proposed in the paper would not be relational, preventing any higher-level composition.

There are a couple of misunderstandings here, possibly due to looking at only the talk and not the paper [2]. Our paper is about implementing a prior proposal ("relational lenses" by Bohannon, Pierce and Vaghan [1]). I'd definitely recommend looking at that paper first to understand ours. Relational lenses are defined in terms of (set-valued) relational algebra, so they are "relational" and "compositional" in the sense I think you mean.

Our paper (a) shows how to implement relational lenses both efficiently and compositionally by generating and translating deltas (again using set-valued relational queries) and (b) gives a practical implementation on top of an SQL database. But the fact that we reduce to SQL-style multiset-valued queries/updates in the implementation (b) is irrelevant: if you have a true relational database you can just use (a) and ignore all the SQL hacking we had to do for (b). (In fact, the semantic gap between the relational algebra operations in the paper and SQL is a potential problem for us, we need to be more careful about deduplication for example.) So in principle it should, if anything, be easier to implement in M36 than on top of SQL.

FWIW, I have a copy of Date's "view update problem" book in front of me but I have to admit I haven't really followed it. M36's isomorphic schemas seem to offer complementary advantages to relational lenses, since the former don't support join and the latter don't support union.

Jun wrote:

What I found most compelling about the paper (and I'm fairly new to all this so I might get this wrong) is the possibility for remote clients (and their clients) to hold a subset of the data as a view (some kind of zoomed-in smaller state, into the larger state of the whole database, including composition of these zooms), and be able to express modifications as deltas, which can then be transferred to the server efficiently and applied to the original data set.

The above describes a possible use case for the results in our paper but I think there are a number of other challenges in this scenario that we didn't address, for example managing transactions/consistency across the data sources and reconciling concurrent changes.

I'd also like to point out that there's a workshop on something called "bidirectional transformations" happening in June in Philadelphia. Both view updates and isomorphic schemas are examples of bidirectional transformations, if either or both of you are interested in this area you might consider attending or sending a paper (for example a tool paper about isomorphic schemas in M36). Papers can be informal/talk proposals.

http://bx-community.wikidot.com/bx2019:home

We are trying to encourage people interested in databases to attend or participate!

[1] http://www.cis.upenn.edu/~bcpierce/papers/dblenses-tr.pdf [2] https://arxiv.org/pdf/1807.01948.pdf