apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.81k stars 13.87k forks source link

Presto template processor functions are not available in Trino #19400

Open aaronfeng opened 2 years ago

aaronfeng commented 2 years ago

Hi,

We are in the process of migrating from Presto to Trino and noticed template processor functions don't work with Trino. We have a lot of datasets using first_latest_partition function.

How to reproduce the bug

Below is working for presto driver, however, it doesn't work for Trino.

SELECT *
FROM foo 
WHERE ds='{{ presto.first_latest_partition('foo') }}'

Expected results

Template processor function should also work for Trino.

Actual results

When using Trino driver, an error is thrown because there are not template processor functions for Trino.

https://github.com/apache/superset/blob/master/superset/jinja_context.py#L553

Screenshots

Environment

I looked at master code, it seems like this will also be an issue for 1.4.0

Checklist

Make sure to follow these steps before submitting your issue - thank you!

pashkash commented 2 years ago

@aaronfeng any luck with this?

aaronfeng commented 2 years ago

@pashkash no, the workaround is to use

max(ds)

but I believe that will scan all partitions.

rusackas commented 9 months ago

Maybe @bkyryliuk @nytai @villebro can confirm if this is an Issue we ought to keep open or close as inactive/deprioritized?

rusackas commented 5 months ago

This is at risk of being closed as stale, but meanwhile, hopefully @dosu-bot can give us a little added insight.

dosubot[bot] commented 5 months ago

Hey @rusackas! I'm Dosu, a bot here to assist you with bugs, questions, and contributions while you're waiting for a human maintainer. Let's squash those bugs together! 🤖

To make the first_latest_partition template processor function work with Trino in Superset, you can use the TrinoTemplateProcessor class. This class ensures compatibility with Trino by inheriting from PrestoTemplateProcessor and setting up the context correctly for Trino.

Here's how you can set it up:

  1. Define the TrinoTemplateProcessor class:

    class TrinoTemplateProcessor(PrestoTemplateProcessor):
        engine = "trino"
    
        def process_template(self, sql: str, **kwargs: Any) -> str:
            template = self.env.from_string(sql)
            kwargs.update(self._context)
    
            # Backwards compatibility if migrating from Presto.
            context = validate_template_context(self.engine, kwargs)
            context["presto"] = context["trino"]
            return template.render(context)
  2. Update the DEFAULT_PROCESSORS dictionary in your Superset configuration:

    DEFAULT_PROCESSORS = {
        "presto": PrestoTemplateProcessor,
        "hive": HiveTemplateProcessor,
        "spark": SparkTemplateProcessor,
        "trino": TrinoTemplateProcessor,
    }
  3. Ensure that the TrinoEngineSpec class handles the latest_partition functionality. This is already implemented in the TrinoEngineSpec class:

    class TrinoEngineSpec(PrestoBaseEngineSpec):
        engine = "trino"
        engine_name = "Trino"
        allows_alias_to_source_column = False
    
        @classmethod
        def get_extra_table_metadata(
            cls,
            database: Database,
            table: Table,
        ) -> dict[str, Any]:
            metadata = {}
    
            if indexes := database.get_indexes(table):
                col_names, latest_parts = cls.latest_partition(
                    database,
                    table,
                    show_first=True,
                    indexes=indexes,
                )
    
                if not latest_parts:
                    latest_parts = tuple([None] * len(col_names))
    
                metadata["partitions"] = {
                    "cols": sorted(
                        list(
                            {
                                column_name
                                for index in indexes
                                if index.get("name") == "partition"
                                for column_name in index.get("column_names", [])
                            }
                        )
                    ),
                    "latest": dict(zip(col_names, latest_parts)),
                    "partitionQuery": cls._partition_query(
                        table=table,
                        indexes=indexes,
                        database=database,
                    ),
                }
    
            if database.has_view(Table(table.table, table.schema)):
                with database.get_inspector(
                    catalog=table.catalog,
                    schema=table.schema,
                ) as inspector:
                    metadata["view"] = inspector.get_view_definition(
                        table.table,
                        table.schema,
                    )
    
            return metadata
  4. Testing: Ensure that the template processor is tested for Trino. You can refer to the integration tests to see how the template processor is tested for different backends, including Trino:

    def test_template_trino(app_context: AppContext, mocker: MockerFixture) -> None:
        lp_mock = mocker.patch(
            "superset.jinja_context.TrinoTemplateProcessor.latest_partition"
        )
        lp_mock.return_value = "the_latest"
        database = mock.Mock()
        database.backend = "trino"
        template = "{{ trino.latest_partition('my_table') }}"
        tp = get_template_processor(database=database)
        assert tp.process_template(template) == "the_latest"
    
        # Backwards compatibility if migrating from Presto.
        template = "{{ presto.latest_partition('my_table') }}"
        tp = get_template_processor(database=database)
        assert tp.process_template(template) == "the_latest"

By following these steps, the first_latest_partition function should work with Trino in the same way it works with Presto [1][2][3].

To continue talking to Dosu, mention @dosu.