[FRG-198] Farrago returns inconsistent results depending on whether or not a column is indexed

dynamobi-build commented 12 years ago

[reporter="zfong", created="Sat, 9 Sep 2006 21:19:46 -0500 (GMT-05:00)"] This following illustrates the problem:

create table tt(a char(5) primary key, b char(5));
insert into tt values('t1a1', 't1a1');
insert into tt values('t2a2', 't2a2');

select * from tt where a > 't1a1'

returns:

+--------+---------+
| A | B |
+--------+---------+
| t1a1 | t1a1 |
| t2a2 | t2a2 |
+--------+---------+

which is wrong. But the following returns the correct result:

select * from tt where b > 't1a1';

+--------+--------+
| A | B |
+--------+--------+
| t2a2 | t2a2 |
+--------+--------+

I found this while testing the ReshapeExecStream. It has the same problem, and I suspect the reason is the same, since it's using very similar logic as the btree index search. The value tuple that's set up in Farrago and then passed to Fennel for the literal comparisons is not blank padded out to 5 characters in this case. Therefore, when compareTuples() is called on 't1a1 ' vs 't1a1', the result is greater than and therefore, the first row is returned when it shouldn't be.

Residuals will probably have the same problem. So, at least we're all consistently wrong :-).

dynamobi-build commented 12 years ago

[author="jvs", created="Mon, 11 Sep 2006 15:52:19 -0500 (GMT-05:00)"] The right place for the fix is probably in org.eigenbase.sarg.SargEndpoint.convertString. When the type is CHAR, we should pad. For VARCHAR, we can leave it as is since there the executor takes care of the implicit trim during comparison (although trimming the search value here could shave off cycles per-row).

dynamobi-build commented 12 years ago

[author="jvs", created="Mon, 11 Sep 2006 15:56:22 -0500 (GMT-05:00)"] Actually, need to review the VARCHAR case; not for the executor, but for endpoint comparison during sarg interval arithmetic. We use compareTo for that, so probably need to trim all VARCHAR values to match the PAD SPACE collation rules (just like we do during execution).

dynamobi-build commented 12 years ago

[author="zfong", created="Mon, 11 Sep 2006 17:45:21 -0500 (GMT-05:00)"] Hmmm ... if the fix is going to be in the sarg expression code, then that will not address the reshape exec stream case, because I'm not using sarg expression analysis to determine whether or not an expression can be processed by the reshape. That's because the analysis is slightly different. Reshape cannot handle IN or both an upper and lower bound range. However, it can handle =.

I was hoping the fix would be somewhere in FennelRelUtil.convertTuplesToBase64String, which the reshape exec stream does use.

dynamobi-build commented 12 years ago

[author="jvs", created="Mon, 11 Sep 2006 21:57:25 -0500 (GMT-05:00)"] I'm fixing it in sargs, because it's broken there even outside of execution: the coordinate value needs to use the trimmed (for VARCHAR) or padded (for CHAR) representation as canonical when combining intervals.

And I don't think you can avoid using sargs (even if you don't need the full SargRexAnalyzer). Here's why:

a) You need sargs to deal with conversion for other types, e.g. DECIMAL (we definitely don't want to repeat stuff like SargEndpoint.convertNumber in FennelRelUtil).

b) You need sargs to take care of the tricky aspects of truncation/rounding (what goes on in SargEndpoint.convertToTargetType). All of the issues are the same.

dynamobi-build commented 12 years ago

[author="zfong", created="Tue, 12 Sep 2006 08:01:15 -0500 (GMT-05:00)"] Yes. I was thinking about that last night as well, and came to the same conclusion. I'll look into passing in a parameter into SargAnalyzer that makes it run in limited mode, tailored for reshape.

dynamobi-build commented 12 years ago

[author="jvs", created="Tue, 12 Sep 2006 09:30:04 -0500 (GMT-05:00)"] Fixed in eigenchange 7623.

LucidDB / luciddb

[FRG-198] Farrago returns inconsistent results depending on whether or not a column is indexed #674