apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.96k stars 13.94k forks source link

Add Support for SQL Server Analysis Services (SSAS) Connection with MDX Query Execution, Dimension Management, Pivot Tables, and Domain User Login #30184

Open ishirvani opened 2 months ago

ishirvani commented 2 months ago

Apache Superset currently lacks native support for SQL Server Analysis Services (SSAS), a popular OLAP (Online Analytical Processing) platform. Integrating SSAS into Superset with support for MDX (Multidimensional Expressions) queries, dimension and database management, pivot table functionalities, and domain user login would greatly enhance its analytical capabilities.

Motivation

Adding these features to Superset will provide several benefits:

  1. Advanced Query Execution: Allow users to execute MDX queries directly within Superset, facilitating complex analytical operations on multidimensional data.
  2. Dimension and Database Management: Provide functionality to browse and manage SSAS cubes, dimensions, and databases, similar to how Excel interacts with SSAS.
  3. Pivot Table Capabilities: Enable users to create and interact with pivot tables directly within Superset, offering a familiar interface for data analysis.
  4. Domain User Login: Allow users to log in with domain credentials that have access to SSAS cubes. This feature will integrate domain authentication, simplifying user management and access control.
  5. Streamlined Workflow: Integrate SSAS capabilities directly into Superset, reducing the need for additional BI tools and streamlining workflows.

Proposed Solution

  1. Connection Support:

    • Implement support for connecting to SSAS using appropriate drivers and connection strings. Add configuration options for SSAS connections.
  2. MDX Query Execution:

    • Develop and integrate a custom SQLAlchemy dialect to handle MDX queries. Enable users to execute and visualize MDX query results within Superset.
  3. Dimension and Database Management:

    • Implement features for browsing SSAS cubes, dimensions, and databases. This functionality should be user-friendly and similar to Excel’s interface with SSAS.
  4. Pivot Table Functionality:

    • Integrate drag-and-drop functionality for creating and manipulating pivot tables using SSAS cube data within Superset.
  5. Domain User Login:

    • Add support for domain user authentication. This should allow users to log in with their domain credentials that have access to SSAS, simplifying the process of accessing and managing SSAS data.
  6. Documentation and User Guide:

    • Update Superset’s documentation to include instructions on configuring SSAS connections, executing MDX queries, managing dimensions, using pivot table features, and setting up domain user logins.
  7. Testing:

    • Conduct comprehensive testing to ensure that the SSAS connection, MDX query execution, dimension management, pivot table functionalities, and domain user login work seamlessly. Include unit tests and integration tests with SSAS.

Additional Context

Steps to Implement

  1. Create a Custom SQLAlchemy Dialect: Develop and test the custom dialect for handling MDX queries and SSAS connections.
  2. Develop Dimension and Database Management: Implement browsing and management features for SSAS cubes and dimensions.
  3. Integrate Pivot Table Functionality: Enable drag-and-drop pivot table creation and manipulation.
  4. Add Domain User Login: Implement support for domain user authentication with SSAS.
  5. Update Documentation: Provide clear documentation and examples for users.
  6. Perform Testing: Validate functionality through rigorous testing.

Thank you for considering this feature request. Implementing these enhancements will provide significant value to users who depend on SSAS for their business intelligence needs and wish to leverage Apache Superset for advanced analytics.


iman shirvani

Originally posted by @ishirvani in https://github.com/apache/superset/discussions/30182

dosubot[bot] commented 2 months ago

Hey there, @ishirvani! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together! 😊

I found a similar unsolved discussion regarding the integration of SQL Server Analysis Services (SSAS) with Apache Superset. You can check it out here: Add Support for SQL Server Analysis Services (SSAS) Connection with MDX Query Execution, Dimension Management, Pivot Tables, and Domain User Login [1].

To enhance Apache Superset to support SSAS with features like MDX query execution, dimension and database management, pivot table functionalities, and domain user login, you would need to create a new DB engine spec for SSAS. Here is a high-level approach:

  1. Create a new DB engine spec for SSAS:

    • Implement a new module in the superset/db_engine_specs directory, for example, superset/db_engine_specs/ssas.py.
    • Extend the BaseEngineSpec class to handle SSAS-specific functionalities.
  2. MDX Query Execution:

    • Override the execute method to support MDX queries.
    • Ensure that the connection string and query execution logic are compatible with SSAS.
  3. Dimension and Database Management:

    • Implement methods to fetch metadata about dimensions and databases.
    • Use SSAS-specific APIs or queries to retrieve this information.
  4. Pivot Table Functionalities:

    • Extend the visualization capabilities to support pivot tables.
    • This might involve creating custom visualization plugins or extending existing ones to handle the multidimensional data returned by SSAS.
  5. Domain User Login:

    • Integrate with the authentication system to support domain user login.
    • This might involve configuring Kerberos or another authentication mechanism supported by SSAS.

Here is a simplified example of what the new DB engine spec might look like:

# superset/db_engine_specs/ssas.py

from superset.db_engine_specs.base import BaseEngineSpec

class SSASEngineSpec(BaseEngineSpec):
    engine = 'ssas'
    engine_name = 'SQL Server Analysis Services'

    @classmethod
    def execute(cls, cursor, query, **kwargs):
        # Implement MDX query execution logic
        cursor.execute(query)
        return cursor.fetchall()

    @classmethod
    def get_dimensions(cls, cursor):
        # Implement logic to fetch dimensions
        cursor.execute("SELECT * FROM $system.DISCOVER_DIMENSIONS")
        return cursor.fetchall()

    @classmethod
    def get_databases(cls, cursor):
        # Implement logic to fetch databases
        cursor.execute("SELECT * FROM $system.DISCOVER_CATALOGS")
        return cursor.fetchall()

    # Add more methods as needed to support other functionalities
  1. Register the new DB engine spec:
    • Add the new engine spec to the list of supported engines in Superset.
# superset/config.py

from superset.db_engine_specs.ssas import SSASEngineSpec

DATABASES = {
    'ssas': SSASEngineSpec,
    # other database specs
}

For domain user login, you can configure LDAP authentication in the superset_config.py file:

# When using LDAP Auth, setup the LDAP server
AUTH_LDAP_SERVER = "ldap://ldapserver.new"

This is a simplified example, and the actual implementation would require more detailed handling of SSAS-specific features and integration points [2][3][4].

To continue talking to Dosu, mention @dosu.

ishirvani commented 2 months ago

Hello,

I am working on adding support for SQL Server Analysis Services (SSAS) to Apache Superset, with the goal of connecting via domain user authentication and ensuring that once connected, the user can view the full set of data, including databases and dimensions.

Here’s what I’ve done so far:

Creating a new DB engine spec: I’ve implemented an initial version of a custom engine spec for SSAS that includes MDX query execution and fetching metadata like dimensions and databases. Here’s the implementation I’ve started with:

python Copy code from superset.db_engine_specs.base import BaseEngineSpec

class SSASEngineSpec(BaseEngineSpec): engine = 'ssas' engine_name = 'SQL Server Analysis Services'

@classmethod
def execute(cls, cursor, query, **kwargs):
    cursor.execute(query)
    return cursor.fetchall()

@classmethod
def get_dimensions(cls, cursor):
    cursor.execute("SELECT * FROM $system.DISCOVER_DIMENSIONS")
    return cursor.fetchall()

@classmethod
def get_databases(cls, cursor):
    cursor.execute("SELECT * FROM $system.DISCOVER_CATALOGS")
    return cursor.fetchall()

Integrating domain user authentication: My goal is to connect to SSAS using domain user credentials (via Kerberos or LDAP). I want to ensure that once the domain user is authenticated, they can view the entire dataset (databases and dimensions) available in SSAS.

Testing and displaying data: After the successful connection, the user should be able to explore the SSAS databases and dimensions through Superset's interface, including using MDX queries.

I would like some guidance on:

How to correctly configure the SSAS connection in Superset for domain user authentication, ensuring that it works with Kerberos or LDAP. Best practices or additional steps to fully register and make this SSAS connection selectable and usable for users. How to ensure that, upon successful authentication, the entire SSAS data (including all databases and dimensions) is properly displayed to the user within Superset. Any insights or existing implementations around SSAS in Superset, specifically for domain user logins, would be greatly appreciated!

Thank you for your support.

dosubot[bot] commented 2 months ago

To correctly configure the SSAS connection in Apache Superset for domain user authentication using Kerberos or LDAP, and ensure the connection is fully registered and usable, follow these steps:

1. Install the Required Database Driver

Ensure you have the necessary driver for SSAS. Since SSAS is not explicitly listed, you may need to find a compatible SQLAlchemy dialect and Python driver for SSAS.

2. Configure Kerberos Authentication

In the Secure Extra field of the database connection configuration, add the following JSON configuration for Kerberos authentication:

{
    "auth_method": "kerberos",
    "auth_params": {
        "service_name": "superset",
        "config": "/path/to/krb5.config"
    }
}

Ensure you have the trino[kerberos] or trino[all] package installed:

pip install trino[kerberos]

3. Configure LDAP Authentication

If you prefer LDAP, you will need to configure it similarly in the Secure Extra field. However, specific details for LDAP configuration are not provided in the given context.

4. Register the Connection

After configuring the authentication, register the connection through the Superset web UI:

5. Display the Entire SSAS Data

Once the connection is successfully tested and saved, Superset should be able to display all databases and dimensions available in SSAS to the authenticated user.

Example Configuration for Kerberos in the Secure Extra Field

{
    "auth_method": "kerberos",
    "auth_params": {
        "service_name": "superset",
        "config": "/path/to/krb5.config"
    }
}

Ensure the connection string and other parameters are correctly set according to the SSAS driver documentation [1].

Custom DB Engine Spec Implementation

Your custom DB engine spec for SSAS looks good. Ensure you have implemented the necessary methods and attributes to support metadata fetching and query execution. Here are some key methods and attributes you might need:

Methods:

Attributes:

Example Custom DB Engine Spec

from superset.db_engine_specs.base import BaseEngineSpec

class SSASEngineSpec(BaseEngineSpec):
    engine = 'ssas'
    engine_name = 'SQL Server Analysis Services'

    @classmethod
    def execute(cls, cursor, query, **kwargs):
        cursor.execute(query)
        return cursor.fetchall()

    @classmethod
    def get_dimensions(cls, cursor):
        cursor.execute("SELECT * FROM $system.DISCOVER_DIMENSIONS")
        return cursor.fetchall()

    @classmethod
    def get_databases(cls, cursor):
        cursor.execute("SELECT * FROM $system.DISCOVER_CATALOGS")
        return cursor.fetchall()

Register the New Engine Spec

Add the new engine spec to the list of supported engines in Superset:

# superset/config.py

from superset.db_engine_specs.ssas import SSASEngineSpec

DATABASES = {
    'ssas': SSASEngineSpec,
    # other database specs
}

By following these steps, you should be able to configure the SSAS connection in Superset for domain user authentication, register the connection, and display the entire SSAS data to the user upon successful authentication [1][2].

To continue talking to Dosu, mention @dosu.

ishirvani commented 2 months ago

Here's the final version with a request for community help at the end:


Title: Issue with loading SSASEngineSpec - "Could not load database driver: SSASEngineSpec"

Hi Apache Superset team,

I am working on adding support for SQL Server Analysis Services (SSAS) to Superset. I have implemented a new class called SSASEngineSpec as needed, but I am encountering the following error:

ERROR: Could not load database driver: SSASEngineSpec

The code I implemented:

from typing import Any, Optional
from datetime import datetime
from superset.db_engine_specs.base import BaseEngineSpec

class SSASEngineSpec(BaseEngineSpec):
    engine = "ssas"
    engine_name = "SQL Server Analysis Services"

    @classmethod
    def execute(cls, cursor: Any, query: str, **kwargs: Any) -> Any:
        cursor.execute(query)
        return cursor.fetchall()

    @classmethod
    def get_dimensions(cls, cursor: Any) -> list[tuple[Any, ...]]:
        cursor.execute("SELECT * FROM $system.DISCOVER_DIMENSIONS")
        return cursor.fetchall()

    @classmethod
    def get_databases(cls, cursor: Any) -> list[tuple[Any, ...]]:
        cursor.execute("SELECT * FROM $system.DISCOVER_CATALOGS")
        return cursor.fetchall()

    @classmethod
    def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
        return f"'{dttm.isoformat()}'"

    @classmethod
    def fetch_data(cls, cursor: Any, limit: Optional[int] = None) -> list[tuple[Any, ...]]:
        if not cursor.description:
            return []
        return super().fetch_data(cursor, limit)

My questions:

  1. Is there something I am missing regarding how to register or load this class?
  2. Do any configuration changes need to be made (e.g., superset_config.py or similar) to load this new driver correctly?
  3. Are there any guidelines or documentation for adding SSAS support?

Additionally, I am trying to connect to SSAS using the following connection string format:

ssas://ip/ssas_dbname

Does this format look correct for SSAS connections in Superset?

I would greatly appreciate any guidance or documentation on resolving this error.

Help from the community:

If anyone in the community has experience integrating SSAS with Superset, your input would be incredibly valuable. Any code samples, documentation, or suggestions to help resolve this issue would benefit not just me, but other developers looking to add SSAS support to Superset as well.

Thank you for your time and assistance!

dosubot[bot] commented 2 months ago

To resolve the "Could not load database driver: SSASEngineSpec" error when adding support for SQL Server Analysis Services (SSAS) to Apache Superset, ensure that the SSASEngineSpec is correctly registered and loaded. Here are the steps and configuration changes required:

1. Define the SSASEngineSpec Class

Ensure that you have a class SSASEngineSpec that extends BaseEngineSpec and is properly defined in a module.

2. Register the SSASEngineSpec

Make sure that the SSASEngineSpec is included in the list of engine specs that Superset loads. This is done in the load_engine_specs function.

Example Implementation

Define the SSASEngineSpec Class

# superset/db_engine_specs/ssas.py
from typing import Any, Optional
from datetime import datetime
from superset.db_engine_specs.base import BaseEngineSpec

class SSASEngineSpec(BaseEngineSpec):
    engine = "ssas"
    engine_name = "SQL Server Analysis Services"

    @classmethod
    def execute(cls, cursor: Any, query: str, **kwargs: Any) -> Any:
        cursor.execute(query)
        return cursor.fetchall()

    @classmethod
    def get_dimensions(cls, cursor: Any) -> list[tuple[Any, ...]]:
        cursor.execute("SELECT * FROM $system.DISCOVER_DIMENSIONS")
        return cursor.fetchall()

    @classmethod
    def get_databases(cls, cursor: Any) -> list[tuple[Any, ...]]:
        cursor.execute("SELECT * FROM $system.DISCOVER_CATALOGS")
        return cursor.fetchall()

    @classmethod
    def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
        return f"'{dttm.isoformat()}'"

    @classmethod
    def fetch_data(cls, cursor: Any, limit: Optional[int] = None) -> list[tuple[Any, ...]]:
        if not cursor.description:
            return []
        return super().fetch_data(cursor, limit)

Register the SSASEngineSpec

Ensure the SSASEngineSpec is loaded in superset/db_engine_specs/__init__.py:

# superset/db_engine_specs/__init__.py
from pathlib import Path
import pkgutil
from importlib import import_module
from superset.db_engine_specs.base import BaseEngineSpec

def load_engine_specs() -> list[type[BaseEngineSpec]]:
    engine_specs: list[type[BaseEngineSpec]] = []

    # Load standard engines
    db_engine_spec_dir = str(Path(__file__).parent)
    for module_info in pkgutil.iter_modules([db_engine_spec_dir], prefix="."):
        module = import_module(module_info.name, package=__name__)
        engine_specs.extend(
            getattr(module, attr)
            for attr in module.__dict__
            if is_engine_spec(getattr(module, attr))
        )

    # Manually add SSASEngineSpec if not using entry points
    from superset.db_engine_specs.ssas import SSASEngineSpec
    engine_specs.append(SSASEngineSpec)

    # Load additional engines from external modules
    for ep in entry_points(group="superset.db_engine_specs"):
        try:
            engine_spec = ep.load()
        except Exception:  # pylint: disable=broad-except
            logger.warning("Unable to load Superset DB engine spec: %s", ep.name)
            continue
        engine_specs.append(engine_spec)

    return engine_specs

Connection String Format

The connection string format for SSAS should be verified against the specific driver you are using. A common format might look like:

ssas://username:password@hostname/ssas_dbname

Ensure that the driver you are using supports this format and that all necessary parameters are included.

By ensuring that the SSASEngineSpec class is defined and properly registered in the load_engine_specs function, you should be able to resolve the "Could not load database driver: SSASEngineSpec" error [1].

To continue talking to Dosu, mention @dosu.

ishirvani commented 2 months ago

Title: "Could not load database driver: SSASEngineSpec" after defining and registering custom engine for SSAS

Body: Hi,

I am facing an issue with adding support for SQL Server Analysis Services (SSAS) to Apache Superset. Despite following the necessary steps to define and register the SSASEngineSpec class, I am encountering the following error:

Error:

Could not load database driver: SSASEngineSpec

What I've Done:

  1. Defined the SSASEngineSpec Class:

    • Created the SSASEngineSpec class in superset/db_engine_specs/ssas.py, extending BaseEngineSpec with methods for executing MDX queries and managing dimensions and databases.
  2. Registered SSASEngineSpec:

    • Updated superset/db_engine_specs/__init__.py to include SSASEngineSpec in the list of engine specs in the load_engine_specs() function.
  3. Connection String:

    • Using the format ssas://username:password@hostname/ssas_dbname .

Logs:

Here are some relevant log entries when attempting to test the connection:

superset_app  | 2024-09-14 08:36:08,301:DEBUG:superset.stats_logger:[stats_logger] (incr) test_connection_attempt
superset_app  | 2024-09-14 08:36:08,319:DEBUG:superset.stats_logger:[stats_logger] (incr) test_connection_error.NoSuchModuleError
superset_app  | 2024-09-14 08:36:08,322:DEBUG:superset.stats_logger:[stats_logger] (incr) DatabaseRestApi.test_connection.warning
superset_app  | 2024-09-14 08:36:08,322:WARNING:superset.views.error_handling:CommandException
superset_app  | Traceback (most recent call last):
superset_app  |   File "/app/superset/commands/database/test_connection.py", line 143, in run
superset_app  |     with database.get_sqla_engine(override_ssh_tunnel=ssh_tunnel) as engine:
superset_app  |   File "/usr/local/lib/python3.10/contextlib.py", line 135, in __enter__
superset_app  |     return next(self.gen)
superset_app  |   File "/app/superset/models/core.py", line 446, in get_sqla_engine
superset_app  |     yield self._get_sqla_engine(
superset_app  |   File "/app/superset/models/core.py", line 465, in _get_sqla_engine
superset_app  |     self.db_engine_spec.validate_database_uri(sqlalchemy_url)
superset_app  |   File "/app/superset/db_engine_specs/base.py", line 2249, in validate_database_uri
superset_app  |     sqlalchemy_uri.get_driver_name(), set()
superset_app  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/url.py", line 625, in get_driver_name
superset_app  |     return self.get_dialect().driver
superset_app  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/url.py", line 680, in get_dialect
superset_app  |     entrypoint = self._get_entrypoint()
superset_app  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/url.py", line 662, in _get_entrypoint
superset_app  |     cls = registry.load(name)
superset_app  |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 343, in load
superset_app  |     raise exc.NoSuchModuleError(
superset_app  | sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:ssas
superset_app  |
superset_app  | The above exception was the direct cause of the following exception:
superset_app  |
superset_app  | Traceback (most recent call last):
superset_app  |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1484, in full_dispatch_request
superset_app  |     rv = self.dispatch_request()
superset_app  |   File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1469, in dispatch_request
superset_app  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
superset_app  |   File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
superset_app  |     return f(self, *args, **kwargs)
superset_app  |   File "/app/superset/views/base_api.py", line 119, in wraps
superset_app  |     duration, response = time_function(f, self, *args, **kwargs)
superset_app  |   File "/app/superset/utils/core.py", line 1364, in time_function
superset_app  |     response = func(*args, **kwargs)
superset_app  |   File "/app/superset/utils/log.py", line 303, in wrapper
superset_app  |     value = f(*args, **kwargs)
superset_app  |   File "/app/superset/views/base_api.py", line 91, in wraps
superset_app  |     return f(self, *args, **kwargs)
superset_app  |   File "/app/superset/databases/api.py", line 1219, in test_connection
superset_app  |     TestConnectionDatabaseCommand(item).run()
superset_app  |   File "/app/superset/commands/database/test_connection.py", line 193, in run
superset_app  |     raise DatabaseTestConnectionDriverError(
superset_app  | superset.commands.database.exceptions.DatabaseTestConnectionDriverError: Could not load database driver: SSASEngineSpec
superset_app  | 2024-09-14 08:36:08,329:INFO:werkzeug:172.18.0.1 - - [14/Sep/2024 08:36:08] "POST /api/v1/database/test_connection/ HTTP/1.1" 422 -

Request:

I have confirmed that SSASEngineSpec is defined and registered according to the steps outlined in the documentation. However, the error persists. Could you provide any additional guidance or check if there might be something missed in the configuration or implementation? Any assistance would be greatly appreciated.

Thank you!