gmantele / vollt

Java libraries implementing the IVOA protocol: ADQL, UWS and TAP
http://cdsportal.u-strasbg.fr/taptuto/
29 stars 28 forks source link

subqueries not permitted after an operator like: =, !=, <, <= , >, >= #114

Open almicol opened 5 years ago

almicol commented 5 years ago

The following ADQL query:

SELECT top 20 t_min, obs_publisher_did, access_url FROM ivoa.ObsCore WHERE proposal_id like '072.C-0488%' AND t_min >= (SELECT t_min FROM ivoa.ObsCore WHERE obs_publisher_did = 'ivo://eso.org/ID?ADP.2014-09-26T16:51:34.173' ) ORDER BY t_min

is perfectly accepted and successfully executed by the SQLServer. That query works in SQLServer because the subquery returns only one record. Had the subquery returned more than one record, than SQLServer would have thrown an exception:

com.microsoft.sqlserver.jdbc.SQLServerException: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

Within TAPLIB, such query is not even attempted, as the parser throws an exception without checking if the subquery returns or not multiple records; here the error message:

Incorrect ADQL query: Encountered "SELECT". Was expecting one of: "\"" "\"" "\"" "\"" "\"" "\"" "\"" "\"" "\"" "\"" "\"" "\"" (HINT: "SELECT" is a reserved ADQL word. To use it as a column/table/schema name/alias, write it between double quotes.)

Would it be possible to change the parser and implement the more dynamic SQLServer behaviour? Thanks!

gmantele commented 5 years ago

You can do whatever you want with JavaCC (the tool I use to generate the ADQL parser), but if you do what you suggest you can not call the resulting language as "ADQL".

What you want is indeed not supported either by v2.0 or by v2.1. This is actually a similar discussion as the support of subqueries in the SELECT clause. I admit that the usage of subqueries in ADQL is fairly limited, but I don't want to change the generic parser for any special syntax.....because it is difficult to maintain for me and then it may create so many different flavor of ADQL that users will be very confused.

Anyway, I think that your query could be rewritten in the following way:

SELECT TOP 20 t_min, obs_publisher_did, access_url
FROM ivoa.ObsCore AS oc, (SELECT t_min FROM ivoa.ObsCore WHERE obs_publisher_did = 'ivo://eso.org/ID?ADP.2014-09-26T16:51:34.173') AS singleObs
WHERE proposal_id LIKE '072.C-0488%'
  AND oc.t_min >= singleObs.t_min
ORDER BY oc.t_min

_I did not test it, but in theory it should work. Note that I used , between the 2 tables....that is generally discouraged, but in your case, the subquery - singleObs - returns only one line, so it is ok. It could probably be possible to rewrite it with a JOIN....ON oc.t_min >= singleObs.t_min but that kind of join is quite difficult to master (for me) and so I am quite uncertain about the result....that's why I preferred the , (alias for CROSS JOIN)._

almicol commented 3 years ago

Thanks Gregory, I tested your suggested query and, with some modifications, I was able to run it. First modification: ORDER BY does not support a tablename.columnname (as per issue #113) because of that I need to remove the oc alias, but that would make t_min ambiguous, hence: second modification: create an alias for the t_min field in the singleObs table. Overall, the query now looks like this:

SELECT TOP 20 t_min, obs_publisher_did, access_url
FROM ivoa.ObsCore AS oc, 
(SELECT t_min as specific_t_min FROM ivoa.ObsCore WHERE obs_publisher_did = 'ivo://eso.org/ID?ADP.2014-09-26T16:51:34.173') AS singleObs
WHERE proposal_id LIKE '072.C-0488%'
AND oc.t_min >= singleObs.specific_t_min
ORDER BY t_min

That works, but a normal user will never guess all that.

Regarding the standard: I'm not sure to understand which part of the ADQL 2.x standard does not allow what I need. ADQL 2.0 states:

2.2.1 Table subqueries and Joins Table subqueries are present and can be used by some existing predicates within the search condition (IN and BETWEEN most likely) or as an artifact of building derived tables. While IN and BETWEEN are explicitly mentioned, the wording is "most likely" and not "exclusively". Is there any other section of the standard that does not permit a: "... >= (subquery) ... " ?

Thanks again!

gmantele commented 3 years ago

That works, but a normal user will never guess all that.

Unfortunately, that is often true with ADQL, but also with SQL. It happens quite often that I have myself to refer to the documentation of the SQL language of the database I use (Postgres, most of the time) to know what is possible or exists, to play around with my query and to experiment a bit before properly writing my query. So, it is not exclusively an "issue" with ADQL. But yes, in such cases, either you have enough experience in writing such queries or you need help from people that might have it.

Regarding the standard: I'm not sure to understand which part of the ADQL 2.x standard does not allow what I need. ADQL 2.0 states:

2.2.1 Table subqueries and Joins Table subqueries are present and can be used by some existing predicates within the search condition (IN and BETWEEN most likely) or as an artifact of building derived tables. While IN and BETWEEN are explicitly mentioned, the wording is "most likely" and not "exclusively".

I think the "most likely" was written so that being still valid if in the future more predicates supporting subqueries become available in ADQL.

Is there any other section of the standard that does not permit a: "... >= (subquery) ... " ?

Yes, in the BNF. If you go down from <search_condition>, you get the following for comparisons:

<comparison_predicate> ::=
    <value_expression> <comp_op> <value_expression>

<value_expression> ::=
    <numeric_value_expression>
    | <string_value_expression>
    | <geometry_value_expression>

...and <numeric_, <string_ and <geometry_value_expression> do not include subqueries, but only litterals, operations or columns.

I already agree, unfortunately, this is only written in the BNF....so, something not easily readable by humans....and even by a computer in the case of this weird BNF grammar we have here. I hope this will become more accessible in the future of ADQL by switching from a BNF to a PEG grammar.

Anyway, I agree that most of the time, you have to practice quite a bit ADQL to come with solutions like the one you found. But even with SQL dialects (being all slightly different), it is already the case. For quite peculiar queries you have to find tricks to reach your goal. I think that right now, the best we can do is to document as many of these special cases as possible so that helping users in designing more easily their queries. We could of course add few more explanations in the ADQL document, but it will never replace user experience :(

almicol commented 3 years ago

Thanks Gregory. Will it be a sensible thing to do to ask to modify ADQL to support subqueries as valid ?

gmantele commented 3 years ago

I think it makes sense to ask, but that's another story to know if it will become part of ADQL (the modification of the grammar could be quite important).

If you want that in ADQL I invite you to publish an issue on https://github.com/ivoa-std/ADQL/issues ...I am also in charge of it now so I could do it, but I think it is better if the original author of the request does it himself. In any case, this issue will be considered only for the version after ADQL-2.1....we will move this latter in RFC very soon.