Closed bilst-visa closed 2 weeks ago
Here's my analysis:
I debugged the code and found that currently when we give where clause separately, the statement which gets executed is like below "SELECT * FROM (SELECT 'name', 'age' from 'xyz') tmp where emp_id=100" which is not the correct behavior. I think it should be "SELECT 'name', 'age' from 'xyz' where emp_id=100"
cc @Hisoka-X @arshadmohammad
it makes sense to query data with a filter. Fetching the records first and then filtering is inefficient and costly. I'm not sure why current behaviour is this way, but it doesn't seem intentional. It appears to be a bug
cc @hailin0
The core problem is that the filter field in the where_condition
must appear in the result of the subquery. If the where condition is directly concatenated to the subquery, incompatibility will occur.
For example:
query: """ SELECT product, SUM(quantity) AS total_quantity
FROM sales
WHERE quantity > 10
GROUP BY product"""
"where_condition"="where total_quantity = 100"
It will become
SELECT product, SUM(quantity) AS total_quantity
FROM sales
WHERE quantity > 10
GROUP BY product where total_quantity = 100
This is not a right SQL. But it works when use temp table
SELECT *
FROM (
SELECT product, SUM(quantity) AS total_quantity
FROM sales
WHERE quantity > 10
GROUP BY product
) tmp WHERE total_quantity = 100;
@Hisoka-X I got what you are saying but I find below two cases are a problem with the current behavior.
If you want to use a where clause with a main query, not the subquery
With engine only, we can directly use
query: "select name, age from xyz where emp_id=100"
How do we achieve this using SeaTunnelTask-web? As it always appends it to a subquery
In seatunnel web, we can not use this for now. But I believe most database will do same optimization, so subquery not a big problem with function.
Case by case, we can change query like this:
query: "select name, age, emp_id from xyz"
"where_condition"="where emp_id=100"
It should be worked, then we can use transform to drop emp_id
field. WDYT about this way? @arshadmohammad @bilst-visa
It should be worked, then we can use transform to drop
emp_id
field. WDYT about this way? @arshadmohammad @bilst-visa
yes, it will work but i think it will be inefficient and costly.
My understanding is that in SeaTunnel queries, we have two WHERE clauses: an inner WHERE clause and an outer WHERE clause. When the job configuration file is created manually, users can decide what to include in the inner and outer WHERE clauses without any issues.
However, in SeaTunnel-Web, there is currently no option to generate an inner WHERE clause; only the outer WHERE clause is supported. I suggest that SeaTunnel-Web internally appends the WHERE clause to the main query, ensuring that only the inner WHERE clause is used. Any thoughts on this?
I suggest that SeaTunnel-Web internally appends the WHERE clause to the main query, ensuring that only the inner WHERE clause is used. Any thoughts on this?
+111. This is a pretty good idea.
Thanks @bilst-visa for raising the PR https://github.com/apache/seatunnel-web/pull/238 PR is merged.
Search before asking
What happened
An error occurs when a conditional column is not included in the SELECT clause while using JDBC-Hive. For example select name, age from xyz where emp_id=100 this fails with error Invalid table alias or column reference 'emp_id': (possible column names are: name,age)
SeaTunnel Version
2.3.8
SeaTunnel Config
Running Command
Error Exception
Zeta or Flink or Spark Version
No response
Java or Scala Version
No response
Screenshots
No response
Are you willing to submit PR?
Code of Conduct