agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

DataFrame for orderBy, limit, and offset #209

Open YuMingLiao opened 6 years ago

YuMingLiao commented 6 years ago

Thanks for @agentm 's opinion, in order to explore more practical usage from relational algebra, we may need

a new type and processing engine for converting Relations to DataFrames akin to pandas or R DataFrames which do have an ordering.

Converting to a DataFrame would emphasize the finality of that processing to the user- further relation algebra operators cannot be applied, though the DataFrame could potentially be converted back to a Relation. Normally, this conversion step would be final step in a data retrieval pipeline.

Some reference: ACCESSING POSTGRES IN A DATAFRAME IN HASKELL

agentm commented 6 years ago

Yea, it would be nice to be able to use a DataFrame-related engine from an existing Haskell project, but, from what I have seen, they are typically typed at compile time. We will need to support arbitrarily-generated types at runtime in order to support conversion to-and-from relations.

I considered exporting to SQLite but there is no way to represent relation-valued attributes or ADTs, so the data type impedance mismatch would be quite painful.

YuMingLiao commented 6 years ago

Oh, I didn't think too far. I just try to find some code that make sure I have something to reference when I think about code.

I just want to have a simple syntax like beam's to use projectm36 and relational algebra in haskell.

And yes, relation-valued attributes and ADTs feels too valuable to lose. It would be interesting to use them both in haskell and relational database.

agentm commented 6 years ago

I'm happy to discuss any future features. From my perspective, I think a good first step would be to implement a Relation -> DataFrame engine. That could be implemented independently of a compile-time type safe interface.

YuMingLiao commented 6 years ago

Thanks! I have implemented a simple feature now.

TutorialD (master/main): :importexample date
TutorialD (master/main): :showexpr s
┌──────────┬────────┬───────────┬───────────────┐
│city::Text│s#::Text│sname::Text│status::Integer│
├──────────┼────────┼───────────┼───────────────┤
│"Paris"   │"S2"    │"Jones"    │10             │
│"Athens"  │"S5"    │"Adams"    │30             │
│"Paris"   │"S3"    │"Blake"    │30             │
│"London"  │"S1"    │"Smith"    │20             │
│"London"  │"S4"    │"Clark"    │20             │
└──────────┴────────┴───────────┴───────────────┘
TutorialD (master/main): :showsorteddataframe s s#
┌──────────┬────────┬───────────┬───────────────┐
│city::Text│s#::Text│sname::Text│status::Integer│
├──────────┼────────┼───────────┼───────────────┤
│"London"  │"S1"    │"Smith"    │20             │
│"Paris"   │"S2"    │"Jones"    │10             │
│"Paris"   │"S3"    │"Blake"    │30             │
│"London"  │"S4"    │"Clark"    │20             │
│"Athens"  │"S5"    │"Adams"    │30             │
└──────────┴────────┴───────────┴───────────────┘
TutorialD (master/main): :showsorteddataframe s city
┌──────────┬────────┬───────────┬───────────────┐
│city::Text│s#::Text│sname::Text│status::Integer│
├──────────┼────────┼───────────┼───────────────┤
│"Athens"  │"S5"    │"Adams"    │30             │
│"London"  │"S1"    │"Smith"    │20             │
│"London"  │"S4"    │"Clark"    │20             │
│"Paris"   │"S2"    │"Jones"    │10             │
│"Paris"   │"S3"    │"Blake"    │30             │
└──────────┴────────┴───────────┴───────────────┘

It can only has ascending order now. And I leave RelationAtom and CustomizedAtom's Ord instance undefined because I don't see the meaning of it.

I am happy to discuss any thing, too.

Let me make an pull request for this first.