dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
314 stars 123 forks source link

[CT-3521] [Feature] Remove `REFRESH` from `snowflake_refresh_snowpipe` #250

Open clementmg-getaround opened 11 months ago

clementmg-getaround commented 11 months ago

Remove REFRESH from here since the Snowflake docs say that:

The REFRESH functionality is intended for short term use to resolve specific issues when Snowpipe fails to load a subset of files and is not intended for regular use

The original feature request transferred from dbt-core follows below:

Is this your first time submitting a feature request?

Describe the feature

The idea is to create the famous sleep/wait command, that is present across multiple frameworks, and that simply allows the worker to wait a defined amount of time: dbt sleep 3

Describe alternatives you've considered

Alternatives were hacky really: launching some impactless commands to mimic a sleep behavior. We also thought of splitting the dbt job in two, schedule their execution through an external tool, which supports sleep command (Airflow). We believe dbt should possess that power.

Who will this benefit?

This known command can be very useful in multiple situations. In our case, a dbt job scheduled two distinct tasks. And because of external shenanigans (related to snowflake pipes), the first job is marked as finished by dbt when in fact it is still running. Therefore, the second job starts too early. A simple dbt sleep with a custom time would prevent that unwanted behavior.

I am positive this feature could help solve similar issues across various platforms.

Are you interested in contributing this feature?

Sure!

Anything else?

No response

graciegoheen commented 10 months ago

Hey! It sounds like your dbt command is completing successfully, when actually the underlying execution statements have not yet completed.

Could you give us some more insight into your specific situation? Ideally, we'd want behavior such that when the dbt command has completed, it has actually completed.

clementmg-getaround commented 10 months ago

Hello!

Hey! It sounds like your dbt command is completing successfully, when actually the underlying execution statements have not yet completed.

Yes precisely. However, this seems to be more on an issue from Snowflake. It seems to send the wrong signal to dbt, saying the task completed (The command system$pipe_status command shows lastIngestedTimestamp the pipe finishes early (as planned conceptually). However, it is not completely: checking copy_history details shows that the pipe runs longer than the success linked above => it runs until a time which corresponds to the copy-command.

We opened a ticket in Snowflake customer support to try and fix that. Adding a sleep command won't tackle the heart of the issue, but it would certainly be a useful temporary fix.

olperri commented 10 months ago

Hello! I worked with @clementmg-getaround into investigating this issue in our org.

In case it can help, so far the only lead we have for the shenanigan described above is :

We are uncertain wether this is the source of the current behaviour, but its our current most likely explanation with our current knowledge

dbeatty10 commented 10 months ago

Nice troubleshooting @clementmg-getaround and @olperri !

Given your findings, I'm going to transfer this issue to dbt-external-tables.

dataders commented 7 months ago

hey @clementmg-getaround @olperri is my understanding here correct?

REFRESH should not be used in conjunction with Snowpipe, even when AUTO_REFRESH=FALSE

Rather than remove the SQL statement about REFRESH I propose a further solution. Remove snowflake_refresh_snowpipe() macro entirely. Instead rely on snowflake__refresh_external_table() which, unless manual_refresh is a no-op (i.e. returns zero SQL, so the package will execute just an empty string.

https://github.com/dbt-labs/dbt-external-tables/blob/21428bce9ea3aed7ac6900d11b158bbe80fb56b0/macros/common/refresh_external_table.sql#L5-L7

I've opened #300, would y'all please let me know if this solution works for you?

olperri commented 6 months ago

Thanks @dataders for looking into this! 🙏

To be fully transparent, I cannot really confirm anything :

Sorry for this "non-answer" but I'll ping the analytics-engineers in our org though as they might have an opinion or could be able to confirm