amphi-ai / amphi-etl

Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.
https://docs.amphi.ai
Other
552 stars 12 forks source link

Multi-line SQL query seems to generate incorrect Python code and fail #39

Closed imrehg closed 1 week ago

imrehg commented 1 week ago

When adding e.g. a Postgres Input, I've added a multili

SELECT x.* FROM events x
WHERE x.event_type = 'CreateEntityEvent' AND x.event_json LIKE '%Person%';

as the interface allows it as well:

Screenshot 2024-06-20 at 14 01 35

Running the connector this way fails however with the error:

('unterminated string literal (detected at line 14)', (14, 9))

and it seems to be because the generated code incorrectly wraps the SQL query in quotes. The generated code reads:

    postgresInput1 = pd.read_sql(
        "SELECT x.* FROM events x
WHERE x.event_type = 'CreateEntityEvent' AND x.event_json LIKE '%Person%';",
        con=conn.connection
    ).convert_dtypes()

which is incorrect indeed.

The correct wrapping could be something like using tripple doublequotes """...""", like the following in this case:

    postgresInput1 = pd.read_sql(
        """SELECT x.* FROM events x
WHERE x.event_type = 'CreateEntityEvent' AND x.event_json LIKE '%Person%';""",
        con=conn.connection
    ).convert_dtypes()

This is not taking any cases of escaping into account, though, that might be relevant in more general cases, e.g. r"""...""", or '''...''' if the query contains double quotes.

I've tested that the wrapping is indeed the issue, as when I adjusted the SQL query to be on a single line, it worked.

I'm using amphi-etl==0.4.3.

tgourdel commented 1 week ago

Thanks a lot @imrehg for the detailed reporting. While obvious, this is a case I had not planned indeed. The fix is pretty easy, thanks for the suggestion. I'll let you know when it's fixed!

tgourdel commented 1 week ago

It should now be fixed in the latest version! Thanks for reporting pip install --upgrade --force-reinstall amphi-etl or jupyterlab-amphi if you're using Jupyterlab