andialbrecht / sqlparse

A non-validating SQL parser module for Python
BSD 3-Clause "New" or "Revised" License
3.71k stars 693 forks source link

Skip last statement when it only contains a comment #668

Open botenvouwer opened 2 years ago

botenvouwer commented 2 years ago

When you encounter a SQL file which ends with a comment the split function will see it as a valid statement:

DROP TABLE IF EXISTS schema_a.table_a CASCADE;
DROP TABLE IF EXISTS schema_a.table_b CASCADE;

-- Now you should do x and y

sqlparse.split results in: ['DROP TABLE IF EXISTS schema_a.table_a CASCADE;', 'DROP TABLE IF EXISTS schema_a.table_b CASCADE;','-- nu kan het inladen beginnen mbv XY']

When you add comments elsewhere they are added to the following statement. However if it is the last one it is parsed as if it were a statement. Is this supposed te be?

danchamorro commented 1 year ago

This is indeed an issue. If you put a comment above your final statement, it will not get read in by sqlparse.

This is good:

-- Table 1
CREATE TABLE table_1;
-- Table 2 
CREATE TABLE table_2;
CREATE TABLE final_table;

This is no good:

-- Table 1
CREATE TABLE table_1;
-- Table 2 
CREATE TABLE table_2;
-- Final Table
CREATE TABLE final_table;

It will not read your last statement with the comment. Here is how I caught it. I added each table name to a list. On my first run, the list didn't have my final table. I removed the last comment above my final statement; the table name was in my generated list.

def run_query(engine, query):
    session = engine.connect()
    table_names = []
    try:
        with open(query) as file:
            queries = file.read()
            statements = sqlparse.split(queries)
            for statement in statements:
                formatted_statement = sqlparse.format(statement, reindent=True, keyword_case='upper', strip_comments=True).strip()
                # session.execute(formatted_statement)
                # logger.info(formatted_statement)
                if statement.startswith("CREATE TABLE"):
                    table_name = statement.split(" ")[2]
                    # Add each table name to a list
                    table_names.append(table_name)
                    logger.info(f"Table {table_name} created")
        logger.info("Tables created")
        logger.info(table_names)
        session.close()
    except Exception as e:
        logger.error(e)
        exit(1)
andialbrecht commented 1 year ago

@botenvouwer yes, this behavior is intended. A statement is finished, when the final (closing) semicolon was seen.

@danchamorro I'm not able to reproduce your example code as it needs additional parameters to be called. But I've tried it with this stripped down version:

>>> import sqlparse
>>> sql = """-- Table 1
... CREATE TABLE table_1;
... -- Table 2
... CREATE TABLE table_2;
... -- Final Table
... CREATE TABLE final_table;"""
>>> sqlparse.split(sql)
['-- Table 1\nCREATE TABLE table_1;',
 '-- Table 2 \nCREATE TABLE table_2;',
 '-- Final Table\nCREATE TABLE final_table;']
>>> statements = sqlparse.split(sql)
>>> for statement in statements:
...     formatted_statement = sqlparse.format(statement, reindent=True, keyword_case='upper', strip_comments=True).strip()
...     print(formatted_statement)
...
CREATE TABLE table_1;
CREATE TABLE table_2;
CREATE TABLE final_table;
>>>

To me, everything looks fine.