Open mvorisek opened 4 years ago
This is important to think about and also this is not that simple to change and fix. That's because there can be anything in expression field. It can be subquery, some fieldname, function call etc.
What resulting select you would like to see instead of this ?
-- foo = field expression: RAND()
SELECT RAND() as foo FROM DUAL WHERE RAND() > 0
-- foo = field expression: RAND()
SELECT RAND() as foo FROM DUAL WHERE RAND() > 0
should ALWAYS be queried like
SELECT RAND() as foo FROM DUAL HAVING foo > 0
non trivial fields (or to be precise fields that are non-constant and not using constant functions) that are used multiple times must be firstly selected and then reused/conditioned using HAVING
sql render using nesting is wrong, we must render to some intermediate form and then build sql once
Nested queries should be much more faster that using having
which executes on full result-set.
On other hand, in some cases (like mentioned above) using having
is a solution.
But again, it will perform much slower. For example, this simple case:
SELECT
(select name from person where id=e.person_id) AS emp_name
FROM employee e
WHERE
(select name from person where id=e.person_id) = 'John'
This is probably faster than using having
in case we have millions of employees.
Even more complexity shows up if we use both - where
and having
in same query. And what if conditions are nested?
@DarkSide666 this should use join if 1:1, if 1:N, your query will fail (bacause subquery will return more than 1 row)
MySQL and other DBs like Oracle or MSSQL engines can probably optimize some HAVING clauses the same like subqueries
nested conditions - you must query data and then you can use constant functions/junctions/operators like you want
Yes, this example is for simple hasOne
case and standard title_field.
Can confirm that - if you have more complex hasOne/hasMany table structures, Atk4/Data gets very slow in case you add fields to child model from parent, grand-parent and grand-grand-parent tables via hasOne->addFields(...). This requires optimization, otherwise for performance reason it is much faster to load parent/grand-parent tables only where needed for single records, rather than adding those (grand-)parent fields straight into the model. If we render SQL correctly, this should not slow down SQL performance though.
@mkrecek234 can you please post here a repro code like in https://github.com/atk4/data/issues/1078#issuecomment-1357370201 for your nested usecase?
@mvorisek Here you go - sorry not designed as a test but normal code:
<?php
use Atk4\Data\Persistence;
use \Atk4\Data\Model;
use \Atk4\Data\Model\AggregateModel;
include '../vendor/autoload.php';
$db = new \Atk4\Data\Persistence\Sql('mysql:host=localhost;dbname=atk4;port=3306;user=root;password=root');
/* Create database 'atk4' in localhost MySql
*
CREATE TABLE `account_group` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(32) DEFAULT NULL,
`sign` float DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `account_group` (`id`, `name`, `sign`)
VALUES
(1, 'Acount Group', -1);
CREATE TABLE `account` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(32) DEFAULT NULL,
`account_group_id` int unsigned DEFAULT NULL,
`fx_rate` float DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `account` (`id`, `name`, `account_group_id`, `fx_rate`)
VALUES
(1, 'Test Account', 1, 1.2);
CREATE TABLE `booking` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`account_id` int unsigned DEFAULT NULL,
`booking_group` varchar(32) DEFAULT NULL,
`value` float DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `booking` (`id`, `account_id`, `booking_group`, `value`)
VALUES
(1, 1, 'Group1', 123),
(2, 1, 'Group1', 456),
(3, 1, 'Group2', 789);
*/
class AccountGroup extends \Atk4\Data\Model {
public $table = 'account_group';
protected function init(): void
{
parent::init();
$this->addFields(['name', 'sign' => ['type' => 'float']]);;
}
}
class Account extends \Atk4\Data\Model {
public $table = 'account';
protected function init(): void
{
parent::init();
$this->addFields(['name', 'fx_rate' => ['type' => 'float']]);
$this->hasOne('account_group_id', ['model' => [AccountGroup::class]])->addField('sign');
}
}
class Booking extends \Atk4\Data\Model {
public $table = 'booking';
protected function init(): void
{
parent::init();
$this->addFields(['booking_group', 'value' => ['type' => 'float']]);
$this->hasOne('account_id', ['model' => [Account::class]])->addFields(['fx_rate', 'sign']);
}
}
$accountModel = new Account($db);
$bookingModel = new Booking($db);
$bookingAggregate = new AggregateModel($bookingModel);
$bookingAggregate->setGroupBy(['booking_group'], [
'total' => ['expr' => 'sum([value])'], ]);
foreach ($bookingAggregate as $booking) {
echo 'Result: ' . $booking->get('booking_group') . ' | ' . $booking->get('total');
}
echo 'That worked..';
$bookingAggregate = new AggregateModel($bookingModel);
$bookingAggregate->setGroupBy(['booking_group'], [
'total' => ['expr' => 'sum([sign] * [value])'], ]);
foreach ($bookingAggregate as $booking) {
echo 'Result: ' . $booking->get('total');
}
echo 'That did not work..';
it seems you somewhat copied the AggregateModel
testcase from aggregate issue
subquery like:
(
select
`fx_rate`
from
`account` `_a_8a089c2a7e6c`
where
`id` = `booking`.`account_id`
) `fx_rate`
should be quite fast as long as the selection is done on unique index, any reasonable SQL optimizer should detect it is a 1:1 join query
In chat you have written:
Just sonething I noticed: When you have multi-layered hasOne relations (so parents tables, grand-parent tables and grand-grandparents tables) and if you always addFields of the respective higher levels, so in your child table you have a lot of fields from grand/grand/grand-tables, Atk4/data gets terribly slow. I stopped doing this and even load fields later, instead of adding them theough hasOne relations. In SQL this would not be so slow IMHO, because you do: SELECT child.field, parent.field, grand-parent.field from child join parent on (…) join grand-parent on (…) join grand-grand-parent on (…) Atk4/data seems to create separate sub-query selects for each added field.
It would be good if you can provide the actual slow query and table metrics
@mvorisek The example above shows a nesting, as we cascade the field down from AccountGroup via Account to Booking. If this was not what you meant, please specify what you need. On the performance I do not have performance metrics, but I can tell that complex models with down-cascaded fields slow down simple load and save model commands to a good extent.
so no AggregateModel above is needed and (new Booking($db))->executeCountQuery()
has the same performance problem?
⚠ even more conterintuitive: https://sqlite.org/forum/forumpost/8202db3916
Of course, I am thinking about optimal data modelling, not about implementation. Implementation is always possible.
that is the same as with Union - we need universal nesting support from Model which could result in join, union, wrap.
The current design is actually completely wrong as presented a few days ago: Imagine field with expr
RAND() > 0
. This field is currently rendered multiple like:it is not only rand, any expression like
SELECT name FROM t LIMIT 1
is unstable as it has no explicit ORDER. Even if we (in theory, we normally do no even know them) add all table columns as a fallback order, queries likeSELECT name, COUNT(*) FROM t ORDER BY name LIMIT 1
will be still unstable because of grouping first.This is major bug which must be addressed.
Any non-trivial expression must SELECT under an alias ONCE and then this alias must be used anywhere else - https://stackoverflow.com/questions/35695171/remove-duplicate-sub-query