DDC-349: Add support for specifying precedence in joins in DQL

doctrinebot commented 14 years ago

Jira issue originally created by user dennis.verspuij:

This request is in followup to my doctrine-user message "Doctrine 2.0: Nested joins'. I am a bit surprised by the responses in that defining precedences in joins by placing parenthesis around join expressions is not well-known. Although not in the original SQL92 specification it is a major and important feature offered by all the RDBMS's that Doctrine 2 supports, and oftenly performs better than using subselects or alike. Doctrine 1 did not support it, but imho Doctrine 2 should support it to be a mature allround ORM.

As a short example the following is a SQL statement with a nested join, where the nesting is absolutely necessary to return only a's together with either both b's and c's or no b's and c's at all:

SELECT * FROM a A LEFT JOIN ( b B INNER JOIN c C ON C.b_id = B.id ) ON B.a_id = A.id

In order for Doctrine 2 to support this the BNF should be something like: Join ::= ["LEFT" ["OUTER"] ]() "JOIN" ( "(" JoinAssociationPathExpression ["AS"] AliasIdentificationVariable Join ")" ]( JoinAssociationPathExpression ["AS") AliasIdentificationVariable ) [("ON" ]() ConditionalExpression) instead of the current: Join ::= ["LEFT" ["OUTER"] ]() "JOIN" JoinAssociationPathExpression ["AS"] AliasIdentificationVariable [("ON" ]() ConditionalExpression)

This would allow DQL like:

SELECT A, B, C FROM a A LEFT JOIN ( A.b B INNER JOIN B.c C ) WITH B.something = 'value' AND C.something = 'othervalue'

What further needs to be done is that the DQL parser loosly couples the ConditionalExpression to any of the previously parsed JoinAssociationPathExpression's instead of tieing it explicitely to the JoinAssociationPathExpression that preceedes it according to the old BNF notation. The new BNF should however not require any changes to the hydrator. Therefore I have the feeling that improving the DQL parser for nested joins does not require extensive work, while the benefit of running these kind of queries is considerable.

As an extra substantiation here are links to (BNF) FROM clause documentations of the RDBMS's that Doctrine 2 supports, they all show support for nested joins: MySQL: http://dev.mysql.com/doc/refman/5.0/en/join.html PostgreSQL: http://www.postgresql.org/docs/8.4/interactive/sql-select.html#SQL-FROM and http://www.postgresql.org/docs/8.1/interactive/explicit-joins.html MSSQL: http://msdn.microsoft.com/en-us/library/ms177634.aspx Oracle: http://download.oracle.com/docs/cd/E11882*01/server.112/e10592/statements*10002.htm#CHDDCHGF SQLite: http://www.sqlite.org/syntaxdiagrams.html#single-source

I surely hope you will consider implementing this improvement because it would save me and others from the hassle of writing raw SQL queries or executing multiple (thus slow) queries in DQL for doing the same. Thanks anyway for the great product so far!

doctrinebot commented 14 years ago

is duplicated by DDC-1256: Generated SQL error with DQL WITH and JOINED inheritance
is referenced by DDC-3500: [GH-1254] Fix applying ON/WITH conditions to first join in Class Table Inheritance

doctrinebot commented 14 years ago

Comment created by @guilhermeblanco:

This seems to be a valid issue to me.

This implementation is the actual solution to associations retrieval that are inherited (type joined).

Example:

/*** Joined **/
class Base {}

class Foo extends Base {}

class Bar {
    public $foo;
}

// This causes the CTI to link as INNER JOIN, which makes the result become 0
// il if you have no Foo's defined (although it should ignore this)
$q = $this->_em->createQuery('SELECT b, f FROM Bar b LEFT JOIN b.foo f');

doctrinebot commented 14 years ago

Comment created by romanb:

Yes, this is a possible solution for DDC-512 but on the SQL level. I still don't see this as appropriate for DQL, it just doesnt make sense to me, DQL joins object associations, there is no precedence.

doctrinebot commented 14 years ago

Comment created by romanb:

So, no, this has nothing to do with DDC-512. DDC-512 can even be fixed differently as outlined in my comments there.

doctrinebot commented 14 years ago

Comment created by romanb:

On a side note I would still like to know/see the following for this issue:

Some realisitic DQL examples where this feature would be essential, i.e. there is no other way to do it. This also means explaining what the impact on the resulting object graph is and why it makes sense.
Which other ORMs support this on the OQL/Criteria level?

So far, my stance on this issue is:

1) It doesnt make sense (semantically) in DQL 2) Its rarely needed 3) When you really need it you can use a NativeQuery anyway and use this nesting in SQL, where it probably belongs and makes more sense 4) It would (unnecessarily) complicate DQL

Thus I am currently leaning towards "Wont fix" for this issue.

doctrinebot commented 14 years ago

Comment created by dennis.verspuij:

Hi Roman. I understand your doubts, and I have been breaking my head over creating a realistic example the last few hours that would hopefully convince you for implementing this feature. But actually I cannot find one that you wouldn't consider to be trivial. I do have a number of very complex optimized queries written for sportskickoff dot com (using Doctrine 1.2) but they are probably hard to understand because they may not be selfdescribing. Below is one example literally ripped from the application. Still they often can be broken down to my example query in this ticket's description, but applied grouping, additional other joins on the root component and/or other criteria made them impossible to rewrite using subselects or choosing another root component. Most often they just performed way best using the nested syntax and saved me a number of additional queries.

SELECT A.id, A.username, A.balance, COALESCE(SUM(B.stake), 0) AS sumstake, COUNT(B.id) AS nrbets FROM account A LEFT JOIN ( bet B INNER JOIN game G ON G.id = :GAMEID AND B.timestampcompletion BETWEEN G.timestampstart AND G.timestampend ) ON B.accountid = A.id AND B.timestampcompletion IS NOT NULL WHERE A.Status & :ACTIVEORDISQUALIFIED = :ACTIVE GROUP BY A.id, A.username, A.balance ORDER BY A.balance DESC, sumstake ASC, nrbets ASC, A.username ASC

But let's put it another way. I would also like this feature to be supported in DQL because I just do not want to use native queries. Why would I want to use native queries if it can be done using DQL? In DQL I work with class names and field names, and they may differ from the underlying table and column names. Doctrine takes care of that mapping based on my schema/annotations and I do not have to "know" these mappings. In native queries I suddenly do have to "know" these mappings. I use Doctrine because it makes my application portable and enables me to work with my database in an OOP way like I do in my model, abstracting things. The need for native queries partly reverts the benefits Doctrine offers in the first place.

Btw, I recall to have successfully used the nested join syntax in HQL (.NET Hibernate) but I cannot find examples on the web or a BNF notation.

Furthermore, in reply to your stances: 1) It indeed doesnt make sense (semantically) in DQL, it only makes the result set different, but not the way data is hydrated into objects; 2) Its indeed rarely needed for inserting, updating and populating basic lists but it allows you to better select what combinations of associated rows are joined and which not in more optimized queries without having to use native queries, or because they perform better than using subseletcs and alike. 3) Not having to use native queries is just an extra reason for using Doctrine and maintains the abstraction the ORM provides througout on'es whole application 4) Why would it complicate DQL, if people do not know about or understand the feature it wouldn't matter because not using parenthesises is the default way to specify joins?

Well, this is it, can't find any more words to promote and make you enthusiastic.... lol.

doctrinebot commented 14 years ago

Comment created by dennis.verspuij:

Ok, I have not given up yet... :), here's a "stupid" example.

Imagine a book store that sells books of various authors and keeps track of those sales. Let's say you would have an admin page that lists all authors, and for each author its also shows the books and their sales dates since january 1st, but only for those books that were actually sold and contain an A in its name. An optimized SQL query to fetch all the information at once would be something like:

SELECT A.*, B., S.** FROM author A LEFT JOIN ( book B INNER JOIN sale S ON S.book_id = B.id AND S.dt >= '2010-01-01' ) ON B.author_id = A.id AND A.name LIKE '%A%'

In DQL it would then be something like:

SELECT A.*, B., S.** FROM author A LEFT JOIN ( book B INNER JOIN sale S WITH S.dt >= '2010-01-01' ) WITH A.name LIKE '%A%'

If the database would contain thousands of books, but sales for just a few books, this will definitely perform better than using subselects. Off course one would like to fetch array graphs instead of objects for further optimization, but this hopefully shows my point.

I have attached a test casefor a similar query, though without the additional join constraints for clarity. I surely hope you can consider it.

One last note, you shouldn't be afraid that nesting joins is not in the ansi SQL spec. Select queries are about record sets and products between these sets, tables are just the basic means of providing record sets to the query. This is an important terminological difference to think about. Specifying precedence with parenthesis around joins is a logical and natural evolution of the ansi sql standard. For example views are a good proof of this concept, I could define book B INNER JOIN sale S as a view and LEFT JOIN that to authors to get effectively the same result set as the above example. The database server would internally perform the same query (though may additionally take indexes on the view into account). That said, rdbm's that support this syntax would certainly never drop the feature, as its not a feature but just plain logical and smart querying!

P.S. I had a hard time finding out how to run the test cases, I could not find it in the Doctrine 2 documentation, development wiki, cookbook or any other place, while finally it was as easy as running phpunit Doctrine_Tests_AllTests from within the tests/ directory, or just phpunit Doctrine_Tests_ORM_Functional_Ticket_DDC349Test for my test. Could you please add some info about this somewhere, it might save others some googling.

doctrinebot commented 14 years ago

Comment created by dennis.verspuij:

Test case as SVN patch using a parenthesized join. Just remove the parenthesises from the query to have it fail...

doctrinebot commented 14 years ago

Comment created by romanb:

@"The need for native queries partly reverts the benefits Doctrine offers in the first place."

That is something I hugely disagree with. Neither SQL abstraction, nor database vendor independence is the main purpose of an ORM like Doctrine 2. It is the state management of your objects, the transparent change tracking, lazy-loading and synchronization of the object state with the database state and nothing of this gets lost when using native queries.

We could rip out DQL and any other querying mechanism except a basic find() (and lazy-loading, of course), only providing the native query facility and even only supporting MySQL and would still retain all the core ORM functionality.

NativeQuery is one of the best and core "features" of the project. It is even the foundation* for DQL. A DQL query is nothing more than an additional (beautiful) abstraction but what comes out is a native query + a ResultSetMapping, the same thing you can build yourself in the first place, *even using the mapping metadata to construct the query. Nothing forces you to hardcode table and column names in native queries if you don't want that. Just use the mapping metadata, DQL does the same.

SQL abstraction and database vendor independence is icing on the cake, not the heart of the ORM.

doctrinebot commented 8 years ago

Imported 1 attachments from Jira into https://gist.github.com/84add54169198c6e0e7d

10569_DDC349Test.patch

doctrine / orm

DDC-349: Add support for specifying precedence in joins in DQL #4301