Open clementmg-getaround opened 11 months ago
Hey! It sounds like your dbt command is completing successfully, when actually the underlying execution statements have not yet completed.
Could you give us some more insight into your specific situation? Ideally, we'd want behavior such that when the dbt command has completed, it has actually completed.
Hello!
Hey! It sounds like your dbt command is completing successfully, when actually the underlying execution statements have not yet completed.
Yes precisely.
However, this seems to be more on an issue from Snowflake. It seems to send the wrong signal to dbt, saying the task completed (The command system$pipe_status
command shows lastIngestedTimestamp
the pipe finishes early (as planned conceptually). However, it is not completely: checking copy_history
details shows that the pipe runs longer than the success linked above => it runs until a time which corresponds to the copy-command.
We opened a ticket in Snowflake customer support to try and fix that. Adding a sleep command won't tackle the heart of the issue, but it would certainly be a useful temporary fix.
Hello! I worked with @clementmg-getaround into investigating this issue in our org.
In case it can help, so far the only lead we have for the shenanigan described above is :
dbt-external-tables
uses the REFRESH
parameter when no auto-ingest is set up [link]REFRESH
=> the emphasis on the warning make us wonder if there are some technical subtleties behind the scenes
Important The REFRESH functionality is intended for short term use to resolve specific issues when Snowpipe fails to load a subset of files and is not intended for regular use.
We are uncertain wether this is the source of the current behaviour, but its our current most likely explanation with our current knowledge
Nice troubleshooting @clementmg-getaround and @olperri !
Given your findings, I'm going to transfer this issue to dbt-external-tables.
hey @clementmg-getaround @olperri is my understanding here correct?
REFRESH
should not be used in conjunction with Snowpipe, even whenAUTO_REFRESH=FALSE
Rather than remove the SQL statement about REFRESH
I propose a further solution. Remove snowflake_refresh_snowpipe()
macro entirely. Instead rely on snowflake__refresh_external_table()
which, unless manual_refresh
is a no-op (i.e. returns zero SQL, so the package will execute just an empty string.
I've opened #300, would y'all please let me know if this solution works for you?
Thanks @dataders for looking into this! 🙏
To be fully transparent, I cannot really confirm anything :
Sorry for this "non-answer" but I'll ping the analytics-engineers in our org though as they might have an opinion or could be able to confirm
Remove
REFRESH
from here since the Snowflake docs say that:The original feature request transferred from dbt-core follows below:
Is this your first time submitting a feature request?
Describe the feature
The idea is to create the famous sleep/wait command, that is present across multiple frameworks, and that simply allows the worker to wait a defined amount of time:
dbt sleep 3
Describe alternatives you've considered
Alternatives were hacky really: launching some impactless commands to mimic a sleep behavior. We also thought of splitting the dbt job in two, schedule their execution through an external tool, which supports sleep command (Airflow). We believe dbt should possess that power.
Who will this benefit?
This known command can be very useful in multiple situations. In our case, a dbt job scheduled two distinct tasks. And because of external shenanigans (related to snowflake pipes), the first job is marked as finished by dbt when in fact it is still running. Therefore, the second job starts too early. A simple
dbt sleep
with a custom time would prevent that unwanted behavior.I am positive this feature could help solve similar issues across various platforms.
Are you interested in contributing this feature?
Sure!
Anything else?
No response