apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.8k stars 13.16k forks source link

snowflake wrong quotes on schema name with dot #29341

Open myskaludek opened 4 weeks ago

myskaludek commented 4 weeks ago

Bug description

When schema name in snowflake has "." then sql lab cannot get list of tables.

Schema in snowflake have for me name "out-c.superset" and in logs for listing table in sql lab is query

SHOW /* sqlalchemy:get_table_names */ TABLES IN out."c-superset"

insead of

SHOW /* sqlalchemy:get_table_names */ TABLES IN "out.c-superset"

How to reproduce the bug

  1. Add snowflake-sqlalchemy to docker image
  2. Start superset 3.1.0
  3. Connectd snoflake database
  4. Go to sql lab and let list tables in schema out-c.superset

Screenshots/recordings

2024-06-24 10:20:58,477:DEBUG:snowflake.connector.cursor:running query [SHOW /* sqlalchemy:get_table_names */ TABLES IN out."c-superset"]

Superset version

3.1.3

Python version

3.10

Node version

I don't know

Browser

Chrome

Additional context

No response

Checklist

dosubot[bot] commented 4 weeks ago

Hey @myskaludek! I'm here to help you out with any bugs, questions, or contributions you have for the repository. Let's work together to get this resolved!

To fix the issue where SQL Lab cannot retrieve the list of tables when the schema name in Snowflake contains a dot (.), you need to adjust the adjust_engine_params method in the SnowflakeEngineSpec class to properly handle schema names with dots.

Here is the relevant code snippet from superset/db_engine_specs/snowflake.py:

@classmethod
def adjust_engine_params(
    cls,
    uri: URL,
    connect_args: dict[str, Any],
    catalog: Optional[str] = None,
    schema: Optional[str] = None,
) -> tuple[URL, dict[str, Any]]:
    if "/" in uri.database:
        current_catalog, current_schema = uri.database.split("/", 1)
    else:
        current_catalog, current_schema = uri.database, None

    adjusted_database = "/".join(
        [
            catalog or current_catalog,
            schema or current_schema or "",
        ]
    ).rstrip("/")

    uri = uri.set(database=adjusted_database)

    return uri, connect_args

This method splits the database URI to handle the catalog and schema separately. If the schema contains a dot, it should be properly encoded to avoid issues. You can use urllib.parse.quote to encode the schema name.

Here is an example of how you might modify the method to handle schema names with dots:

from urllib.parse import quote

@classmethod
def adjust_engine_params(
    cls,
    uri: URL,
    connect_args: dict[str, Any],
    catalog: Optional[str] = None,
    schema: Optional[str] = None,
) -> tuple[URL, dict[str, Any]]:
    if "/" in uri.database:
        current_catalog, current_schema = uri.database.split("/", 1)
    else:
        current_catalog, current_schema = uri.database, None

    # Encode the schema name to handle dots
    if schema:
        schema = quote(schema, safe='')

    adjusted_database = "/".join(
        [
            catalog or current_catalog,
            schema or current_schema or "",
        ]
    ).rstrip("/")

    uri = uri.set(database=adjusted_database)

    return uri, connect_args

This modification ensures that schema names with dots are properly encoded, allowing SQL Lab to retrieve the list of tables without issues [1].

To continue talking to Dosu, mention @dosu.

rusackas commented 4 weeks ago

The Superset PMC doesn't currently have any Snowflake testers/users on the roster. I'd encourage anyone reading this to sign up here to volunteer to test future Snowflake issues. I hope someone from the community is able to submit a PR for this 🤞