Closed waynexia closed 7 months ago
before query execution
In datafusion, it support set
statement to do this kinds of job.
As for other two cases, they seems not very useful in practice.
statement error Error during planning: SHOW TABLES is not supported unless information_schema is enabled
SHOW TABLES
Also I find this error declare is useful when our sql files have lot of cases, and we need to check which SQL will throw error.
In datafusion, it support set statement to do this kinds of job.
Yes, parameter
is to provide something like this. DataFusion does support this, and as well as MySQL, PostgreSQL. But I'm doubting if this is a standard SQL grammar and, to my opinion, it's not a conflict for us (a test framework) to support this as well. Consider these scenarios:
SET
, but I want to change my configuration via APIs other than SQL like HTTP.Those are two practical examples I come up with, and are relying on test context. The parameter
are the way to provide context to each SQL.
As for other two cases, they seems not very useful in practice.
Post-process like replace-result
is widely used in MySQL's integration test. It's very normal for a query to results in random values. What if you query random()
, current_time()
, or system table for current memory consumption? I'm not care about the actually value, but just want to make sure they works, or can match some fixed pattern.
Also I find this error declare is useful when our sql files have lot of cases, and we need to check which SQL will throw error.
If I read it correctly, the error message will present in .result
file next to the query resulting it?
My database is distributed on A, B and C, I set data distribution rules for my table and I want to verify it. I execute a distributed insert, and want to connect to A, B, and C to do a non-distributed query. I want to know what query will be sent to which instance.
This example make sense to me, as for other aspect, I think we can wait until some real world issue arise .
The first principle I obey is to stick with SQL, if SQL can fix it, then we don't have to, it bring little value to this project IMO.
If I read it correctly, the error message will present in .result file next to the query resulting it?
I usually check SQL file to see how many cases have been added, but I can't tell the bad cases from the good one, I don't care what this SQL will output, I only want to know which SQL will throw error.
Maybe we can add a ERR: <reason>
special syntax to declare it(this may belong to your first proposal parameter
).
-- ERR: don't support XX type now
create table t(a XX);
My database is distributed on A, B and C, I set data distribution rules for my table and I want to verify it. I execute a distributed insert, and want to connect to A, B, and C to do a non-distributed query. I want to know what query will be sent to which instance.
This example make sense to me, as for other aspect, I think we can wait until some real world issue arise .
Nice, I'll start working on this one.
The first principle I obey is to stick with SQL, if SQL can fix it, then we don't have to, it bring little value to this project IMO.
I agree. But how do I test a random()
function or results contains timestamp. I think this is also a necessary part to accomplish SQL's functionality.
If I read it correctly, the error message will present in .result file next to the query resulting it?
I usually check SQL file to see how many cases have been added, but I can't tell the bad cases from the good one, I don't care what this SQL will output, I only want to know which SQL will throw error.
Maybe we can add a
ERR: <reason>
special syntax to declare it(this may belong to your first proposalparameter
).-- ERR: don't support XX type now create table t(a XX);
That makes sense. We should preserve the comment in the result file. (maybe comment is enough? Is it necessary to add new syntax?)
I agree. But how do I test a random() function or results contains timestamp. I think this is also a necessary part to accomplish SQL's functionality.
It's possible to remove random value in this way
select count(random())
That makes sense. We should preserve the comment in the result file. (maybe comment is enough? Is it necessary to add new syntax?)
I prefer hard requirement, comment is optional.
I agree. But how do I test a random() function or results contains timestamp. I think this is also a necessary part to accomplish SQL's functionality.
It's possible to remove random value in this way
select count(random())
Then I'd ask as a user, why this framework doesn't support SQL like random()
and force me to write redundant SQL?
And the second use case, how do I test a system table query? E.g., I expect to see the node info, but want to ignore the memory usage to each node because it changes time by time. Do I have to project out that column? Then what about if my query contains timestamp column? I don't think this way is a good practice. "fix it by SQL" should be "I want to project some column, and the SQL does support projection", but not "I don't want to project column, but the framework forces me to make projection". The SQL query result is naturally unstable in most cases. That is the point we should stick to and add support for.
Require every SQLs to be stable reproducible because we are a text comparison based framework. That doesn't look good 😥
That makes sense. We should preserve the comment in the result file. (maybe comment is enough? Is it necessary to add new syntax?)
I prefer hard requirement, comment is optional.
What do you mean by hard-requirement? Isn't the original goal to make the case readable?
Then I'd ask as a user, why this framework doesn't support SQL like random() and force me to write redundant SQL?
I'm not very strong about this, I just think we should keep result as 'static' as possible, replace-result
interceptor indeed has its value, and wait to see your implementation.
What do you mean by hard-requirement? Isn't the original goal to make the case readable?
hard requirement
means we must add -- ERR: xx
prefix for bad sql, otherwise next SQL should not throw error.
I think it's not conflict with readability, instead it make more clear what the SQL will do.
hard requirement means if we must add -- ERR: xx prefix for bad sql, otherwise next SQL should not throw error.
I think it's not conflict with readability, instead it make more clear what the SQL will do.
Got it. This is already covered by now? If a query returns an error, it result would be something like ERR: xxx
(depending on how the user formats their result). And re-run it should also return that error.
Got it. This is already covered by now?
Not exactly, we have to check result file to ensure this, what I want is to declare this in original SQL
statement error DataFusion error: Error during planning: table 'datafusion.information_schema.tables' not found
SELECT * from information_schema.tables
Like this in sqllogcitest.
Oh, like "should_panic
" or "statement err
", right? We can have a try. But I'm not sure if we should also take xxx
into consideration, as it's duplicated with the follow-up full-text match. Maybe left the reason part to only be a comment?
Oh, like "should_panic" or "statement err", right?
Yes.
Maybe left the reason part to only be a comment?
Comment seems OK to me.
Describe This Problem
"Interceptor" is a hook point in sqlness, which allows people to write tests with extra functionality. E.g., provide per-case config, parameterized SQL or resource limitation etc.
Proposal
Some interceptable points I can think of are:
.sql
file startAnd I plan to add one simple interceptor implementation for each point as an example, and to see if this mechanism actually works.
.sql
caseAdditional Context
No response