agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

add DataFrame datatype. :showdataframe <relExpr> <attrName> #211

Closed YuMingLiao closed 5 years ago

YuMingLiao commented 6 years ago

Right now it has only sort in ascending order. Any feature interface advice would be welcomed.

YuMingLiao commented 6 years ago

I changed command to showdataframe and added offset and limit the syntax now is :showdataframe relExpr orderby attributeName [offset integer] [limit integer]

TutorialD (master/main): :importexample date
TutorialD (master/main): :showdataframe s orderby s# offset 1 limit 2
┌──────────┬────────┬───────────┬───────────────┐
│city::Text│s#::Text│sname::Text│status::Integer│
├──────────┼────────┼───────────┼───────────────┤
│"Paris"   │"S2"    │"Jones"    │10             │
│"Paris"   │"S3"    │"Blake"    │30             │
└──────────┴────────┴───────────┴───────────────┘
TutorialD (master/main): :showdataframe s orderby s# offset 1
┌──────────┬────────┬───────────┬───────────────┐
│city::Text│s#::Text│sname::Text│status::Integer│
├──────────┼────────┼───────────┼───────────────┤
│"Paris"   │"S2"    │"Jones"    │10             │
│"Paris"   │"S3"    │"Blake"    │30             │
│"London"  │"S4"    │"Clark"    │20             │
│"Athens"  │"S5"    │"Adams"    │30             │
└──────────┴────────┴───────────┴───────────────┘
TutorialD (master/main): :showdataframe s orderby s# limit 2
┌──────────┬────────┬───────────┬───────────────┐
│city::Text│s#::Text│sname::Text│status::Integer│
├──────────┼────────┼───────────┼───────────────┤
│"London"  │"S1"    │"Smith"    │20             │
│"Paris"   │"S2"    │"Jones"    │10             │
└──────────┴────────┴───────────┴───────────────┘
TutorialD (master/main): :showdataframe s orderby s#
┌──────────┬────────┬───────────┬───────────────┐
│city::Text│s#::Text│sname::Text│status::Integer│
├──────────┼────────┼───────────┼───────────────┤
│"London"  │"S1"    │"Smith"    │20             │
│"Paris"   │"S2"    │"Jones"    │10             │
│"Paris"   │"S3"    │"Blake"    │30             │
│"London"  │"S4"    │"Clark"    │20             │
│"Athens"  │"S5"    │"Adams"    │30             │
└──────────┴────────┴───────────┴───────────────┘
YuMingLiao commented 6 years ago

Add orderby multiple attributes with ASC/DESC order. :showdataframe relExpr orderby {attrName1 [ASC/DESC], attrName2 ...} The order is optional. The default is ASC.

That's it for now.

TutorialD (master/main): :showdataframe s orderby {city, s# DESC}
┌──────────┬────────┬───────────┬───────────────┐
│city::Text│s#::Text│sname::Text│status::Integer│
├──────────┼────────┼───────────┼───────────────┤
│"Athens"  │"S5"    │"Adams"    │30             │
│"London"  │"S4"    │"Clark"    │20             │
│"London"  │"S1"    │"Smith"    │20             │
│"Paris"   │"S3"    │"Blake"    │30             │
│"Paris"   │"S2"    │"Jones"    │10             │
└──────────┴────────┴───────────┴───────────────┘
agentm commented 6 years ago

Regarding ordering for sub-relations, I think it should be possible to specify the ordering within the subrelations' attributes, but not on the sub-relation attribute itself. Ordering on the sub-relation's attribute would probably require allowing the user to run an aggregate query on the sub-relation's attribute which can be done now by adding a function evaluation on the sub-relation to the relational expression and then ordering on that attribute.

YuMingLiao commented 6 years ago

I see. I have a picture in mind now. I'll find some other time to implement this. Thanks!

YuMingLiao commented 6 years ago

Thinking in fundamentally, it's just unsortable sets / sortable lists transformation between Relation and DataFrame, including attribute order and atom order in a tuple.

Maybe it can have something to do with type-iso or semi-iso. Just leave a thought here for reference.

YuMingLiao commented 6 years ago

I guess a simple Zipper can be used to implement toDataFrame with sub-DataFrame.

zipTree :: Tree a -> Tree b -> Tree (a,b)
zipTree (Leaf a)     (Leaf b)     = Leaf (a,b)
ZipTree (Node l1 r1) (Node l2 r2) = 
    let l = zipTree l1 l2
        r = zipTree r1 r2 
    in Node l r 

But I can't feel how user will use fromDataFrame and order a DataFrame with sub-DataFrame together in a tutd line.

So in this feature i'll stop here until the need emerges.

agentm commented 5 years ago

I polished off this feature with tests and documentation and merged it to master. Sorry it took so long!

I'll be looking at your persistent driver soon. The persistent driver will need to be updated to support data frames (sorting, ordering, offsets).