agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

showdataframe seems dropping tuples when tuple number is big. #257

Closed YuMingLiao closed 4 years ago

YuMingLiao commented 4 years ago
TutorialD (master/main): createarbitraryrelation a { a Text, b Int, c Int} 100-100
TutorialD (master/main): :showdataframe a{b,c} orderby {c}
...
│91│-16    │26     │
│92│-21    │27     │
│93│-5     │27     │                                                              
│94│-8     │27     │              
│95│27     │28     │               
└──┴───────┴───────┘ 
agentm commented 4 years ago

I am not able to reproduce this on the stock master using these exact commands.

Can you confirm that you don't have other changes that could cause this? Do you always lose 5 tuples?

YuMingLiao commented 4 years ago

Thanks for clarifying! I am looking into this, too. I will report after I find something.

agentm commented 4 years ago

Does

createarbitraryrelation a { a Text, b Int, c Int} 100-100 :showexpr a

show 95 or 100 tuples? That could narrow down the cause.

Note that showexpr intentionally randomizes the order of the tuples when printing tuples to the screen.

YuMingLiao commented 4 years ago

After long hours of recompiling, I guess I can offer the joke for the day.

I am pretty sure it's because in the process of operating project, field a's atom values that makes tuples different has been dropped. Hence the number difference between showexpr a and showdataframe a{b, c} orderby {c}.

It is reminding me that the meaning of project in relational algebra is not the same as hiding columns in a dataframe. It is a proof that project-m36 really adheres strictly to the mathematics.

Thanks for the attention, @agentm !

agentm commented 4 years ago

Hm, do you mean that "createarbitraryrelation" does not create 100 unique tuples?

YuMingLiao commented 4 years ago

No, I think createarbitraryrelation do create 100 unique tuples.

I think neither arbitrary nor dataframe mechanism makes this situation. It's my expectation that needs to be corrected.

I mean something like this:

TutorialD (master/main): a := relation{tuple{a 1, b 1},tuple{a 2, b 1}}
TutorialD (master/main): :showexpr a{b}
┌──────────┐
│b::Integer│
├──────────┤
│1         │
└──────────┘

The original tuples are unique. (1, 1) and (2. 1) are different. After I only project field b, they become the same. So there is only one tuple left in a {b}. That's why the tuple number changed.

So I can't think Relation as something that records objects. Unless I gave them a unique constraint on field object_id or field creation time. And if I "project out" the unique key, that means I am only interested in the possible relation between fields left, and don't care their uniqueness made by the unique key.

agentm commented 4 years ago

Ah, I see. Thanks for the explanation.

Indeed, this is a major difference between the SQL tuple bag model vs. proper relational algebra tuple set model. The rationale is that truth (a tuple representing one predicate fact) need only be stated once. In other words, saying something twice doesn't make it more true.

YuMingLiao commented 4 years ago

Thanks! Good to know the reason why it is like that.