CartoDB / analytics-toolbox-core

A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities
Other
190 stars 43 forks source link

chore(sf|h3): reimplement basic h3 functions #489

Closed DeanSherwin closed 8 months ago

DeanSherwin commented 8 months ago

Description

Shortcut

Type of change

Change our H3 functions to use Snowflakes new native H3 functions.

Acceptance

View or run the snowflake_test.sh benchmark test script (available here - https://app.shortcut.com/cartoteam/story/391762/run-benchmarks-with-new-at-snowflake-h3-functions). It runs each of the re-implemented functions on 'main' and this branch and dumps execution times to CSV.

shortcut-integration[bot] commented 8 months ago

This pull request has been linked to Shortcut Story #391760: Reimplement basic H3 functions in AT Snowflake using native H3 functions.

CLAassistant commented 8 months ago

CLA assistant check
All committers have signed the CLA.

vdelacruzb commented 8 months ago

I would update the Description -> Shortcut on this PR to contain the shortcut story and autolink.

Also I recommend taking a look to the CONTRIBUTING.md. You will find more details about conventions for PR titles.

In this case would be more like: chore(sf|h3): reimplement basic h3 functions

Jesus89 commented 8 months ago

Hi @DeanSherwin @vdelacruzb.

I think renaming the input parameters is not needed (that way we avoid the need to update the docs).

I've tested with index, INDEX, "INDEX", and it worked fine. What problem did you face @DeanSherwin?

DeanSherwin commented 8 months ago

Hi @DeanSherwin @vdelacruzb.

I think renaming the input parameters is not needed (that way we avoid the need to update the docs).

I've tested with index, INDEX, "INDEX", and it worked fine. What problem did you face @DeanSherwin?

Not a technical problem, Just a habit. I like to avoid using reserved SQL keywords like INDEX or SIZE as parameter names.

I will change it back now. Thanks!

Jesus89 commented 8 months ago

Got it. Let's keep the interface for now to minimize the changes, but good point to consider in the future

DeanSherwin commented 8 months ago

@Jesus89 @vdelacruzb Guys - my thinking now is that H3_KDISTANCE_RING should be reverted to the original UDF as on main branch.

A single call to a JS library is much faster than using the new H3_GRID_DISK and then passing that array to a UDF for processing. My attempts at a better algorithm in JS failed. And any SQL I can create to get a better performance isn't compatible with a UDF. Thoughts?

Unlike H3_BOUNDARY I don't think this warrants a follow-up task as this function doesn't have a native SF implementation and is custom to CARTO's UDF library anyways.

Jesus89 commented 8 months ago

I agree. We could create a ticket to research in the future (same as H3_BOUNDARY) if H3_GRID_DISK gets improved or they provide a more useful function for our H3_KDISTANCE_RING.

DeanSherwin commented 8 months ago

I agree. We could create a ticket to research in the future (same as H3_BOUNDARY) if H3_GRID_DISK gets improved or they provide a more useful function for our H3_KDISTANCE_RING.

Agreed: https://app.shortcut.com/cartoteam/story/398983/research-an-improved-h3-kring-distance-for-at-sf