Open alamb opened 9 hours ago
Yes, I think one of the comments in that discussion mentioned that certain changes that would cause breakage should be mentioned in every release. So before release, we should list out the possible changes that would need to be made if a upgrade were to happen during the development process.
So before release, we should list out the possible changes that would need to be made if a upgrade were to happen during the development process.
I think it is a great idea. The challenge will be identifying such changes I thunk
There is an interesting approach at MariaDB, they generate queries with different syntaxes to find regressions. Basically we can take their main.sql file which is 7MB of different queries including join queries and adapt it to DF.
There is no answers check, just smoke test that query can run successfully
The example can be found https://github.com/mariadb-corporation/mariadb-qa/tree/master/pquery
@alamb WDYT? it looks like a low hanging fruit, we can take the file and run it in latest datafusion CLI as part of CI or major release verification process
@alamb WDYT? it looks like a low hanging fruit, we can take the file and run it in latest datafusion CLI as part of CI or major release verification process
I think in general the more testing we have the better. This idea sounds good to me -- I think more fully leveraging @2010YOUY01 's integration into sqlancer is also quite interesting.
Let's try and write some tickets to capture these ideas too - I can spend some time working on this over the next day or two
There is an interesting approach at MariaDB, they generate queries with different syntaxes to find regressions. Basically we can take their main.sql file which is 7MB of different queries including join queries and adapt it to DF.
There is no answers check, just smoke test that query can run successfully
The example can be found https://github.com/mariadb-corporation/mariadb-qa/tree/master/pquery
@alamb WDYT? it looks like a low hanging fruit, we can take the file and run it in latest datafusion CLI as part of CI or major release verification process
Almost absolutely NOT. https://github.com/mariadb-corporation/mariadb-qa/blob/master/LICENSE.md
Thats frustrating. Lets see if sqlancer can generate something similar.
Is your feature request related to a problem or challenge?
This is broken out from a more general ticket here
🥳 In my opinion DataFusion is now good enough (performance and feature wise) for many people to have buit real systems and products
However, as more people build "real" systems using DataFusion, our historic "move fast and break things and hope you can keep up" mentality likely needs to adjust to a more mature "move as fast as possible, but minimize breakages" type response.
My summary of the discussion on https://github.com/apache/datafusion/issues/13525 from @findepi @scsmithr @waynexia @timsaucer @Rachelint @Omega359 @jonmmease @Dandandan and @andygrove was that many existing heavy users of DataFusion spend a lot of time during upgrades from one DataFusion release to another
Specifically, I think the core challenge I heard was NOT the mechnical API changes required, but the effort required to diagnose more suble issues such as:
Describe the solution you'd like
I would like to improve the ease of upgrading DataFusion versions
There are many ways to do so and I would like to use this ticket to capture / organize the work in this area
Related Items
Additional testing
More Context: