akoutmos / sql_fmt

Format and pretty print SQL queries
https://hex.pm/packages/sql_fmt
MIT License
24 stars 1 forks source link

Formatter breaks queries including various postgres operators #4

Open pnezis opened 1 week ago

pnezis commented 1 week ago

This is a bug of the rust crate.

I would suggest to add a WARNING in the docs till this is fixed since it will break queries using such operators.

akoutmos commented 1 week ago

I think I have a temporary fix for this using regex replaces. Not an ideal long term solution but fixes this in the short term until the Rust library is fixed. Should be releasing that later today!

akoutmos commented 1 week ago

Going to be playing around with this for a bit but from some initial tests it seems to fix any issues introduced by the Rust library https://github.com/akoutmos/sql_fmt/commit/23ce12ec2c137f7d53e6727cc251d3e758d0f447

pnezis commented 1 week ago

I don't think it's that simple, this would introduce other issues, for example the following input:

SELECT 2 * -3

would be converted to SELECT 2 *-3. This may be a valid SQL but for other operators it will lead to invalid queries. The only guaranteed way to solve this is at the tokenisation level.

pnezis commented 1 week ago

Another example that makes this more clear:

SELECT 2 - - 3; 

which returns 5 would be formatted to

SELECT 2 -- 3;

which evaluates to 2

akoutmos commented 6 days ago

Good points. Perhaps another route needs to be explored if these issues cannot be sorted out in the sqlformat-rs package.

There is the https://github.com/apache/datafusion-sqlparser-rs project which offers a lexer and parser for various SQL dialects. But there is no formatter provided so I (or we if you are interested lol) would have to build one after the library generates the AST of the provided SQL statements. The library provides a Visitor trait for the various https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Visitor.html AST nodes...so a formatted should be possible. The added benefit of using an actual lexer+parser is that it will generate error messages for invalid SQL whereas right no sqlformat-rs is happy to process whatever you give it.

pnezis commented 4 days ago

Implementing a formatter given an AST is doable but it will need a lot of work. On the other hand fixing tokenisation of operators on the sqlformat-rs package is much easier.

My only concern currently is if sqlformat-rs is production ready and if any other such bug exists. In our case the test suite broke, but a formatter that modifies the sql queries may lead to subtle bugs that they will be a nightmare to detect.