Open firebird-automations opened 14 years ago
Commented by: @asfernandes
You shouldn't use GEN_ID as such and expect consistent results.
For the case you want, FB 3 has window functions, see it in my blog http://asfernandes.blogspot.com.
For now, one way to do that type of thing is with selectable procedures.
Commented by: @pavel-zotov
Adriano,
I'm talking only about FB optimizer and how to make it much smarter. GEN_ID has been used in these samples only for illustration that unnecessary work is done in case of JOIN.
I know that some heuristic rules are placed inside opt.cpp and propose that these rules can be improved.
Commented by: @pavel-zotov
PS. OFFTOP. Just finished read your blog about windowed function. This is THE GREAT WORK! Especially running totals (sum(salary) over (order by salary)) and navigation via LAG / LEAD. Thank you in advance.
Commented by: @dyemanov
I suggest this ticket to be either rewritten or closed in favor of the new one, the current description does not match the test case.
Look at the plan:
Select Expression -> Filter -> Nested Loop Join (inner) -> Aggregate -> Table "EMPLOYEE" as "EA EX" Full Scan -> Table "EMPLOYEE" as "E" Full Scan
Obviously, the aggregated derived table is composed and accessed just once. However, the join condition is evaluated for the every row inside the inner stream E. And the "+0*GEN_ID" part (as well as any other expression) is evaluated as many times too, hence the visible effect.
For the first test case, result of the singleton invariant subquery is cached after its first evaluation. Caching/materializing a scalar value or a result set are completely different things, they must not be compared with each other.
Commented by: @pavel-zotov
alter sequence emp_no_gen restart with 0;
Output:
EMP_NO SALARY G ======= ===================== ===================== 110 6000000.00 42 118 7480000.00 42 121 99000000.00 42
Trace:
PLAN JOIN (EA EX EMPLOYEE NATURAL, E NATURAL) 3 records fetched 0 ms, 2 read(s), 223 fetch(es), 42 mark(s)
Table Natural Index **************************************************** EMPLOYEE 84
So, perhaps... no problem at all ?.. But I'm sure that at least on 2.5.x I did see many times inefficient statistics per tables...
Commented by: @dyemanov
> From what time FB start to do somewhat like 'cache' (or 'remember') results of aggregating ?
It never did. But the first stream of the nested loop join is always accessed once.
> So, perhaps... no problem at all ?
The problem (stated in the ticket title) exists and I know what exactly you had in mind. But this ticket is plain wrong, maybe just a badly though sample. I will create a proper one (or maybe change this one) a bit later.
Submitted by: @pavel-zotov
Consider two variants of query against table EMPLOYEE of "standard" database that is supplied with any FB.
Both variant results the rows with EMP_NO and such SALARY that are greater than average salary of all employees. Also we call generator emp_no_gen to determine the quontity of operations that occures during this action: 1) via NON-correlated subquery: select e.emp_no, e.salary, gen_id(emp_no_gen,0) g from employee e where (e.salary > (select avg(ex.salary)+0*gen_id(emp_no_gen,1) avgval from employee ex))
Sequential running of this variant shows that EMP_NO_GEN increases by 1 at each time. So we can be sure that aggregate part (inside WHERE) is evaluated only once.
2) via JOIN: select e.emp_no, e.salary, gen_id(emp_no_gen,0) g from employee e join (select avg(ex.salary)+0*gen_id(emp_no_gen,1) avgval from employee ex) ea on e.salary > ea.avgval
In this case the EMP_NO_GEN increases by 42 each time (i.e. by step that is equal select count(*) from emp).
I think it will be useful to make Firebird "understand" that such part as 2nd in this join must be evaluated only once, like in variant 1.