Open aaronfeng opened 2 years ago
@aaronfeng any luck with this?
@pashkash no, the workaround is to use
max(ds)
but I believe that will scan all partitions.
Maybe @bkyryliuk @nytai @villebro can confirm if this is an Issue we ought to keep open or close as inactive/deprioritized?
This is at risk of being closed as stale, but meanwhile, hopefully @dosu-bot can give us a little added insight.
Hey @rusackas! I'm Dosu, a bot here to assist you with bugs, questions, and contributions while you're waiting for a human maintainer. Let's squash those bugs together! 🤖
To make the first_latest_partition
template processor function work with Trino in Superset, you can use the TrinoTemplateProcessor
class. This class ensures compatibility with Trino by inheriting from PrestoTemplateProcessor
and setting up the context correctly for Trino.
Here's how you can set it up:
Define the TrinoTemplateProcessor
class:
class TrinoTemplateProcessor(PrestoTemplateProcessor):
engine = "trino"
def process_template(self, sql: str, **kwargs: Any) -> str:
template = self.env.from_string(sql)
kwargs.update(self._context)
# Backwards compatibility if migrating from Presto.
context = validate_template_context(self.engine, kwargs)
context["presto"] = context["trino"]
return template.render(context)
Update the DEFAULT_PROCESSORS
dictionary in your Superset configuration:
DEFAULT_PROCESSORS = {
"presto": PrestoTemplateProcessor,
"hive": HiveTemplateProcessor,
"spark": SparkTemplateProcessor,
"trino": TrinoTemplateProcessor,
}
Ensure that the TrinoEngineSpec
class handles the latest_partition
functionality. This is already implemented in the TrinoEngineSpec
class:
class TrinoEngineSpec(PrestoBaseEngineSpec):
engine = "trino"
engine_name = "Trino"
allows_alias_to_source_column = False
@classmethod
def get_extra_table_metadata(
cls,
database: Database,
table: Table,
) -> dict[str, Any]:
metadata = {}
if indexes := database.get_indexes(table):
col_names, latest_parts = cls.latest_partition(
database,
table,
show_first=True,
indexes=indexes,
)
if not latest_parts:
latest_parts = tuple([None] * len(col_names))
metadata["partitions"] = {
"cols": sorted(
list(
{
column_name
for index in indexes
if index.get("name") == "partition"
for column_name in index.get("column_names", [])
}
)
),
"latest": dict(zip(col_names, latest_parts)),
"partitionQuery": cls._partition_query(
table=table,
indexes=indexes,
database=database,
),
}
if database.has_view(Table(table.table, table.schema)):
with database.get_inspector(
catalog=table.catalog,
schema=table.schema,
) as inspector:
metadata["view"] = inspector.get_view_definition(
table.table,
table.schema,
)
return metadata
Testing: Ensure that the template processor is tested for Trino. You can refer to the integration tests to see how the template processor is tested for different backends, including Trino:
def test_template_trino(app_context: AppContext, mocker: MockerFixture) -> None:
lp_mock = mocker.patch(
"superset.jinja_context.TrinoTemplateProcessor.latest_partition"
)
lp_mock.return_value = "the_latest"
database = mock.Mock()
database.backend = "trino"
template = "{{ trino.latest_partition('my_table') }}"
tp = get_template_processor(database=database)
assert tp.process_template(template) == "the_latest"
# Backwards compatibility if migrating from Presto.
template = "{{ presto.latest_partition('my_table') }}"
tp = get_template_processor(database=database)
assert tp.process_template(template) == "the_latest"
By following these steps, the first_latest_partition
function should work with Trino in the same way it works with Presto [1][2][3].
To continue talking to Dosu, mention @dosu.
Hi,
We are in the process of migrating from
Presto
toTrino
and noticed template processor functions don't work withTrino
. We have a lot of datasets usingfirst_latest_partition
function.How to reproduce the bug
Below is working for
presto
driver, however, it doesn't work for Trino.Expected results
Template processor function should also work for Trino.
Actual results
When using Trino driver, an error is thrown because there are not template processor functions for Trino.
https://github.com/apache/superset/blob/master/superset/jinja_context.py#L553
Screenshots
Environment
1.3.0
I looked at master code, it seems like this will also be an issue for
1.4.0
Checklist
Make sure to follow these steps before submitting your issue - thank you!